What's covered in this article
The platform displays validation statistics, warnings and recommended actions for enabled entities in the Validation page, much like it does for every label in your taxonomy.
To see these, navigate to the Validation page and select the 'Entities' tab at the top, as shown in the image below.
How to access Entity Validation page
How does entity validation work?
The process in which the platform validates its ability to correctly predict entities is very similar to how it does it for labels.
Verbatims are split (80:20) into a training set and a test set (determined randomly by the verbatim ID of each comment) when they are first added to the dataset. Any entities that have been assigned (predictions that were accepted or corrected) will fall into the training set or the test set, based on whichever set the verbatim that they're in was assigned to originally.
As there can sometimes be a very large number of entities in one verbatim and no guarantee whether a verbatim is in the training set or the test set, you may see a large disparity between the number of entities in each set.
There may also be instances where all of the assigned entities fall into the train set. As at least one example is required in the test set to calculate the validation scores, this entity would require more assigned examples until some were present in the test set.
How are the scores calculated?
The individual precision and recall statistics for each entity with sufficient training data are calculated in a very similar way to that of labels:
Precision = No. of matching entities / No. of predicted entities
Recall = No. of matching entities / No. of actual entities
A 'matching entity' is where the platform has predicted the entity exactly (i.e. no partial matches)
The F1 Score is simply the harmonic mean of both precision and recall.
It's worth noting that the precision and recall stats shown in this page are most useful for the entities that are trainable live in the platform (shown in the second column above), as all of the entities reviewed for these entity kinds will directly impact the platform's ability to predict those entities.
Hence accepting correct entities and correcting or rejecting wrong entities should be done wherever possible.
For entities that are pre-trained, in order for the validation statistics to provide an accurate reflection of performance, users would need to ensure they accept a considerable amount of correct predictions, as well as correcting wrong ones.
If they were only to correct wrong predictions, the train and test sets would be artificially full of only the instances where the platform has struggled to predict an entity, and not those where it is better able to predict them. As correcting wrong predictions for these entities does not lead to a real-time update of these entities (they are updated periodically offline), the validation statistics may not change for some time and could be artificially low.
Accepting lots of the correct predictions may not always be convenient, as these entities are predicted correctly far more often than not. But if the majority of the predictions are correct for these entities, it's likely that you may not need worry about their precision and recall stats in the Validation page.
What do the summary statistics means?
The summary stats (average precision, average recall and average F1 score) are simply averages of each of the individual entity scores.
Like with labels, only entities that have sufficient training data are included in the average scores. Those that do not have sufficient training data to be included have a warning icon next to their name.
Please Note: The summary stats incorporate all of the entities with sufficient training data, both those that are trainable live and those that are pre-trained. The predictions for entities that are pre-trained are often only corrected when they are wrong, and not always accepted when they are right. This means their precision and recall stats can often be artificially low, which would lower the average scores.
The Entities Validation page shows the average entity performance statistics, as well as a chart showing the average F1 score of each entity versus their training set size. The chart also flags entities that have amber or red performance warnings.
Entities Validation page
The entity performance statistics shown are:
- Average F1 Score: Average of F1 scores across all entities with sufficient data to accurately estimate performance. This score weighs recall and precision equally. A model with a high F1 score produces fewer false positives and negatives.
- Average Precision: Average of precision scores across all entities with sufficient data to accurately estimate performance. A model with high precision produces fewer false positives.
- Average Recall: Average of recall scores across all entities with sufficient data to accurately estimate performance. A model with high recall produces fewer false negatives.
Understanding entity performance
The entity performance chart shown in the Metrics tab of the Validation page (see above) gives an immediate visual indication of how each individual entity is performing.
For an entity to appear on this chart, it must have at least 20 pinned examples present in the training set used by the platform during validation. To ensure that this happens, users should make sure they provide a minimum of 25 (often more) pinned examples per entity from 25 different verbatims.
Each entity will be plotted as one of three colours, based on the model's understanding of how the entity is performing. Below, we explain what these mean:
Entity performance indicators
Entity performance indicators:
- Those entities plotted as blue on the chart have a satisfactory performance level. This is based on numerous contributing factors, including number and variety of examples and average precision for that entity
- Entities plotted as amber have slightly less than satisfactory performance. They may have relatively low average precision or not quite enough training examples. These entities require a bit of training / correction to improve their performance
- Entities plotted as red are poorly performing entities. They may have very low average precision or not enough training examples. These entities may require considerably more training / correction to bring their performance up to a satisfactory level
Please Note: you will see the amber and red performance indicators appear in the entity filter bars in Explore, Reports and Validation. This helps to quickly notify you which entities need some help, and also which entities' predictions should not be relied upon (without some work to improve them) when using the analytics features.
Individual entity performance
Users can select individual entities from the entity filter bar (or by clicking the entity's plot on the 'All entities' chart) in order to see the entity's performance statistics.
The specific entity view will also show any performance warnings and recommended next best action suggestions to help improve its performance.
The entity view will show the average F1 score for the entity, as well as its precision and recall.
Example entity card with recommended actions
To better understand how to improve entity performance, see here.
Previous: Reviewing and applying entities | Next: Improving entity performance