The second phase of model training is known as 'Explore' (as each step is completed in the Explore page). In this phase, you will build on the foundations of the taxonomy that you created by reviewing clusters and searching for different terms and phrases in Discover.
The objective of the Explore phase is to provide each of the labels that are important to you with enough varied and consistent training examples, so that the platform has sufficient training data from which to make accurate predictions across the entire dataset.
Explore is the core phase of model training, and requires the most time and effort. That being said, the more time you spend on completing the Explore phase in a thorough and consistent manner, the less time you will need to spend on the 'Refine' phase, which focuses on improving the performance of your model.
The Explore page has various training modes, and this phase focuses primarily on four of them:
'Shuffle' - shows a random selection of verbatims for users to label. It's vital to complete a significant chunk of training in Shuffle, in order to create a training set of examples that is representative of the wider dataset.
Reviewing predictions by label - once you've provided some examples for a label, the platform will start predicting where it applies to other verbatims. Reviewing and accepting or correcting these predictions can be an important part of the training process to prepare labels for the next step.
'Teach' (for unreviewed verbatims) - as soon as the platform is making some reasonable predictions for a label, you can improve its' ability to predict the label for more varied examples by reviewing verbatims in the default Teach mode (which is for unreviewed verbatims). This will show you verbatims where the platform is unsure whether the selected label applies or not.
'Low Confidence' - shows verbatims where Re:infer has very little confidence that any label applies. Reviewing verbatims in this mode will help improve the coverage of your dataset by teaching the model which labels apply to verbatims it currently thinks do not fit your taxonomy.
This section of the Knowledge Base will also cover training using search in Explore, which is very similar to training using search in Discover.
There is another training mode in Explore - Teach (for reviewed verbatims) - that is explained in the 'Refining Models & Using Validation' section of the Knowledge Base here.
How much training should you do for each label?
The number of examples required for Re:infer to accurately predict each label can vary a lot depending on the breadth or specificity of a label concept.
It may be that a label is typically associated with very specific and easily identifiable words, phrases or intents, and the platform is able to predict it consistently with relatively few training examples. It could also that a label captures a broad topic with lots of different variations of language that would be associated with it, in which case it could require significantly more training examples to allow Re:infer to consistently identify instances where the label should apply.
The platform can often start making predictions for a label with as little as five examples, though in order to accurately estimate the performance of a label (how well Re:infer is able to predict it), each label requires at least 25 examples.
When labelling in Explore, the little red dials (examples shown below) next to each label indicate whether more examples are needed for Re:infer to accurately estimate the label's performance. The dial starts to disappear as you provide more training examples and will disappear completely once you reach 25.
Label training dials
This does not mean that with 25 examples Re:infer will be able to accurately predict every label, but it will at least be able to validate how well it's able to predict each label and alert you if additional training is required.
During the Explore phase, you should therefore ensure that you've provided at least 25 examples for all of the labels that you are interested in, using a combination of the steps mentioned above (mostly Shuffle and Teach + Unreviewed).
During the 'Refine' phase it may become clear that more training is required for certain labels to improve their performance, and this is covered in detail here.
Label performance warnings
In Explore, once you reach 25 pinned examples for a label, you may see one of the below label performance indicators in place of the training dial:
- The grey circle is an indicator that the platform is calculating the performance of that label - it will update to either disappear, or an amber or red circle once calculated
- Amber is an indicator that the label has slightly less than satisfactory performance and could be improved
- Red is an indicator that the label is performing poorly and needs additional training / corrective actions to improve it
- If there is no circle, then this means that the label is performing at a satisfactory level (though still may need improving depending on the use case and desired accuracy levels)
- To understand more about label performance and how to improve it, you can start here
Label performance indicators
Predicted label counts vs. pinned label counts
If you click the tick icon (as shown below) at the top of the label filter bar to filter to reviewed verbatims, you will be shown the number of reviewed verbatims that have that label applied.
If you click the computer icon to filter to unreviewed verbatims, you will be shown the total number of predictions for that label (which includes the number of reviewed examples too).
In Explore, when neither reviewed or unreviewed is selected, Re:infer shows the total number of pinned verbatims for a label as default. In Reports, the default is to show the total predicted.
Pinned vs. predicted Label count
Please Note: the predicted number is an aggregation of all the probabilities that Re:infer calculates for this label. For example, 2 verbatims with a confidence level of 50% would be counted as 1 predicted label.
Helpful tips for using Explore
- The model can start to make predictions with only a few labelled verbatims, though for it to make reliable predictions, you should label at a minimum of 25 verbatims per label. Some will require more than this, it will depend on the complexity of the data, the label and the consistency with which the labels have been applied
- When labelling a verbatim, focus on the mentality of the person who has written it. You should be labelling the content based on the problem they are trying to solve (what is in the verbatim), not based on how you would solve it (using your own expertise). Reinfer needs to be able to infer from what is in the Verbatim what the label should be
- In Explore, you should also try and find Verbatims where the model has predicted a label incorrectly. You should remove incorrect labels and apply correct ones. This process helps to prevent the model from making a similar incorrect prediction in future
|Don't forget! During this phase you will be applying a lot of labels so remember to adhere to the key labelling best practices of adding all labels that apply, applying them consistently, and labelling what you see in front of you|