User permissions required: 'View Sources' AND 'Review and label'
The final key step in Explore is training using 'Low confidence' mode, which shows you verbatims for which Re:infer has either no predicted labels or labels predicted with a very low confidence. This means that for each individual label, the model typically has less than 10% confidence that the label applies to these verbatims.
This is a useful step to improve how well your current taxonomy and training covers the verbatims in your dataset. If you see verbatims which should have existing labels predicted for them, this is a sign that you need to complete more training for those labels. If you see relevant verbatims for which no current label is applicable, you may want to create new labels to capture them. You can assign labels to verbatims in this mode in the same way as any other Explore mode.
To access this mode, use the dropdown in the top right-hand corner of the Explore page.
Dropdown menu to access ‘Low predictions’
How much training should I do for this step?
This mode will present you with 20 verbatims at a time, and you should complete a reasonable amount of training in this mode, going through multiple pages of verbatims and applying the correct labels, to help increase the model's coverage (see here for a detailed explanation of coverage).
The total amount of training you need to complete in Low Confidence will depend on a few different factors:
- How much training you completed in Shuffle and Teach + Unreviewed - the more training you do in Shuffle and Teach + Unreviewed, the more your training set should be a representative sample of the dataset as a whole, and the fewer relevant verbatims there should be in Low Confidence
- The purpose of the dataset - if the dataset is intended to be used for automation and requires very high coverage, then you should complete a larger proportion of training in Low Confidence to identify the various edge cases for each label
At a minimum, you should aim to label 5 pages of verbatims in this mode. Later on in the Refine phase when you come to check your coverage, you may find that you need to complete more training in Low Confidence to improve your coverage further.
Please Note: This feature and step in the training process is one of the reasons it's particularly helpful to have an 'Uninformative' or 'Spam' label that is well trained, so that it filters out the uninformative verbatims from this page and increases the chance that 'Low confidence' may show you relevant verbatims.
Previous: Training using 'Teach' on unreviewed verbatims | Next: Training using 'Search' (Explore)