The platform displays validation statistics for enabled entities in the Validation page, much like it does for every label in your taxonomy.
To see these, navigate to the Validation page and select the 'Entities' tab at the top, as shown in the image below.
Entities Validation page
How does entity validation work?
The process in which the platform validates its ability to correctly predict entities is very similar to how it does it for labels.
Verbatims are split (80:20) into a training set and a test set (determined randomly by the verbatim ID of each comment) when they are first added to the dataset. Any entities that have been assigned (predictions that were accepted or corrected) will fall into the training set or the test set, based on whichever set the verbatim that they're in was assigned to originally.
As there can sometimes be a very large number of entities in one verbatim and no guarantee whether a verbatim is in the training set or the test set, you may see a large disparity between the number of entities in each set.
There may also be instances where all of the assigned entities fall into the train set - e.g. Country in the image above. As at least one example is required in the test set to calculate the validation scores, this entity would require more assigned examples until some were present in the test set.
How are the scores calculated?
The individual precision and recall statistics for each entity with sufficient training data are calculated in a very similar way to that of labels:
Precision = No. of matching entities / No. of predicted entities
Recall = No. of matching entities / No. of actual entities
A 'matching entity' is where the platform has predicted the entity exactly (i.e. no partial matches)
The F1 Score is simply the harmonic mean of both precision and recall.
Trainable entities
It's worth noting that the precision and recall stats shown in this page are most useful for the entities that are trainable live in the platform (shown in the second column above), as all of the entities reviewed for these entity kinds will directly impact the platform's ability to predict those entities.
Hence accepting correct entities and correcting or rejecting wrong entities should be done wherever possible.
Pre-trained entities
For entities that are pre-trained, in order for the validation statistics to provide an accurate reflection of performance, users would need to ensure they accept a considerable amount of correct predictions, as well as correcting wrong ones.
If they were only to correct wrong predictions, the train and test sets would be artificially full of only the instances where the platform has struggled to predict an entity, and not those where it is better able to predict them. As correcting wrong predictions for these entities does not lead to a real-time update of these entities (they are updated periodically offline), the validation statistics may not change for some time and could be artificially low.
Accepting lots of the correct predictions may not always be convenient, as these entities are predicted correctly far more often than not. But if the majority of the predictions are correct for these entities, it's likely that you may not need worry about their precision and recall stats in the Validation page.
What do the summary statistics means?
The summary stats (average precision, average recall and average F1 score) are simply averages of each of the individual entity scores.
Like with labels, only entities that have sufficient training data are included in the average scores. Those that do not have sufficient training data to be included have a warning icon next to their name.
Please Note: The summary stats incorporate all of the entities with sufficient training data, both those that are trainable live (e.g. Organisation and Person) and those that are pre-trained. The predictions for entities that are pre-trained are often only corrected when they are wrong, and not always accepted when they are right. This means their precision and recall stats can often be artificially low, which would lower the average scores.
Previous: Reviewing and applying entities | Next: Improving entity performance