User permission required: 'View sources' AND 'Review and label'
'Shuffle' is the first step in Explore and its purpose is to provide users with a random selection of verbatims for them to review. In shuffle mode, Re:infer will show you verbatims that have predictions covering all labels (and where there are none) so the Shuffle step differs from the others in Explore in that it doesn’t focus on a specific label to train but covers them all.
Why is training using 'Shuffle' mode so important?
It is very important to use Shuffle mode to ensure that you provide your model with sufficient training examples that are representative of the dataset as a whole, and are not biased by focusing only on very specific areas of the data.
Overall, at least 20% of the training you complete in your dataset should be in Shuffle mode.
Labelling in Shuffle mode essentially helps ensure that your taxonomy covers the data within your dataset well, and prevents you from creating a model that can very accurately make predictions on only a small fraction of the data within the dataset.
- Select 'Shuffle' from the drop-down menu to be presented with 20 random verbatims
- Filter to unreviewed verbatims
- Review each verbatim and any associated predictions
- If there are predictions, you should either confirm or reject these. Confirm by clicking on the ones that apply
- Remember you should also add all other additional labels that apply
- If you reject the prediction(s) you should apply all of the correct label(s) - don’t leave the verbatim with no labels applied
- You can also hit the refresh button to get a new set of verbatims, or click to the next page (at the bottom)
We'd recommend labelling at least 10 pages worth of verbatims in Shuffle as a minimum. In large datasets with lots of training examples, this could be much more.
You should aim to complete approximately 20% or more of all training in Shuffle mode.