Support Centre

Model Training & Maintenance

Guides on how to create, improve and maintain Models in Re:infer, using platform features such as Discover, Explore and Validation

Check your model's coverage

User permissions required: 'View Sources' AND 'Review and label'



This final step in our recommended training process is intended to give you confidence that your model is performing well and as expected. This section will cover the concept of coverage, how to check it and the recommended steps to improve it if needed, and essentially act as a model close-out guide.


If you've been thorough in your training throughout the process, this step should be a series of checks with potentially some minor additional training required here or there. If you haven't been as thorough as you could've been, this step should help you identify the gaps in your model and help you fill them.

What is coverage?


Coverage is a term frequently used in Machine Learning and relates to how well a model 'covers' the data it's used to analyse. In Re:infer, this means how accurately your taxonomy of labels (and their training examples) represents your dataset as a whole. You can think of it this way: to ensure you have good coverage, you would need to have enough labels to describe each of the key concepts in your dataset, as well as varied and consistently applied training examples for each of those labels.

Your model having good coverage is particularly important if you are using it to drive automated processes. For example, for a model designed to automatically route different requests received in a shared mailbox, low coverage would mean that lots of requests were inaccurately routed, or sent for manual review as the model could not identify them. 


Coverage can be broken down into two core concepts:

Concept coverage: how comprehensively your labels represent the concepts within the real-life data you are trying to model

Accuracy coverage: for each label concept, how well the model is able to predict where it applies

Returning to the example of automatically routing requests received in a shared mailbox:


  • If there are 15 processes managed by the team working in the mailbox, but the taxonomy only effectively captures 12 of those, this would be poor concept coverage. During the automation the remaining three processes would likely be missed, sent for manual review, or falsely classified as a different process and routed to the wrong place
  • If there are insufficient varied training examples for some of the labels, each representing a process, and the model is not able to consistently predict them correctly, then this would be poor accuracy coverage 

The visual below demonstrates how this example might look in practice - we have multiple clients sending multiple request types through email. Each client may write the same request type in a different way:

 Not all request types covered - Lower concept coverage
 All request types but not all different client examples covered, indicated by label performance warnings – Lower accuracy coverage
All request types and client examples covered – Good concept and accuracy coverage

How to check your model's coverage


This section and the recommendations outlined below are particularly important for models trained for automation purposes. Whilst it is is still important to have good coverage on analytics focused models, it is slightly less critical. If you are building an analytics focussed model, the extent to which you follow the guidelines and checks below is up to you, based on the level of performance that's acceptable to you. It's worth noting that an analytics focused model will always sacrifice some accuracy in order to broadly capture a very wide range of concepts in its' taxonomy.

Earlier in the Refine phase, we covered understanding validation scores and improving label performance (here), specifically label precision. To achieve good accuracy coverage, each label in your taxonomy should have a high average precision. So if you have not already done so, the first thing you should do is to ensure that each label in your taxonomy has a satisfactory average precision (75%+) in the Validation page.

Indicative coverage check

One of the first things you can do to check how well the labels that are important to you cover the data in your dataset is by selecting all of those labels in Reports.


Labels selected in Reports filter bar

The verbatim count at the top of the page in Reports updates based on filters applied. When you select labels from the label filter, the count updates to show the number of verbatims that are likely to have at least one of the selected labels predicted.


Verbatim count in Reports

In this example dataset of emails solely relating to a margin call process in a bank (which contains 260,000 emails), you can see that it is likely that 237,551 verbatims out of the 260,000 will have at least one of the selected labels predicted. Indicating a good coverage of approximately 91.4%.

This should not be your only check to test the coverage of your model, however, you should also complete the additional checks outlined below, to ensure good concept and accuracy coverage for your model.

Additional checks:

Next you should complete a series of checks that will give you confidence that your model genuinely has good concept as well as accuracy coverage. If the model fails any of the checks, follow the recommended actions to improve the coverage:

ProcessActions to take
2-day period prediction review
Review predictions on 1-2 days worth of recent data: use the time filter and ‘recent’ in the drop down to pick 2 recent days worth of data. Review the predictions and make sure each verbatim has a reasonably high confidence prediction. By reviewing predictions 1-2 days worth of data it should ensure all potential concepts are covered

•  If there are verbatims with no predictions or insufficient confidence then label them as normal

•  Then train more in Shuffle and Low Confidence

ShuffleReview predictions in Shuffle for at least 5 pages. Each verbatim should have a label predicted with a reasonably high confidence

•  If there are verbatims with no predictions or insufficient confidence then label them as normal

•  Then train more in Shuffle and Low Confidence

Low ConfidenceLow Confidence mode shows you verbatims where the confidence of any label applying is low. This should be used to ensure you have not missed any potential labels or that you have provided enough examples for each existing label

•  If there are verbatims that have not been covered add a new label for them and train out as normal

•  Where you find a verbatim for an existing label, apply it as normal

'Re-Discover'Returning to Discover can show you potential new clusters where the probability of any label applying is low. This should be used to ensure you have not missed any potential labels or to provide existing labels with more varied examples, in a similar way to Low Confidence

•  If there are clusters with no predictions (or very low) then label the cluster with either a new label or an existing one if applicable

•  Train out any new label as normal

Please Note: the best possible way to understand how well a model performs coverage-wise for an automation is to properly test it with live data before deploying it to production.



This final check, 'Re-Discover', is a step that can be revisited at any time during the training process, but can also be useful when checking in on your models' performance. 

This step essentially just entails going back to the Discover page on 'Cluster' mode and reviewing the clusters in there to check their predictions and to see if Discover has found any clusters that may have been missed by your training.

As the clusters in Discover retrain after a significant amount of training has been completed in the platform (180 annotations) or a significant amount of data has been added to the dataset (1000 verbatims or 1%, whichever is greater, and at least 1 annotation), they should regularly update throughout the training process.

Discover tries to find clusters that are not well covered by label predictions. If there are clusters in Discover that should have certain labels predicted but don't, you know you need to do some more training for those labels. See here for how to label clusters in Discover.

If your model is well trained, Discover will struggle to find clusters with low confidence or no predictions. If you see that each of the clusters in Discover has reasonably high-confidence and correct predictions, this is a good indicator that your model covers the dataset well.

Previous: Training using 'Teach' on reviewed and unreviewed verbatims     |      Next: When to stop training your model 

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.


View all