Need help training your model? See our top tips below for each stage of the training process:


General


The two most common pitfalls to watch out for when you first label a dataset are:


Applying labels inconsistently. Make sure you apply it to all occasions it should apply and making sure you don't change your definition of what the label means part way through labelling. If you do you should go back and review where you have labeled it before
Partial labelling and not applying all labels that apply to a verbatim. When applying labels to a verbatim make sure you apply all labels that should be applied and not just the one you are focusing on. By not doing this you are telling the model that other labels don't apply


Examples


Applying Labels inconsistently


Below shows some verbatims taken from dummy hotel reviews where there is a label for the size of the room, called 'Room > Size'. The first two images show verbatims where this label should be applied but the user has not applied them consistently:


 

Figure 1: Verbatim with the 'Room > Size' label applied correctly

 

 

 

Figure 2: Verbatim with the 'Room > Size' label not applied when it should be

 

 

In the example above, the 'Room' label has been applied to the second image but not the 'Room > Size' label, where it should have. This is inconsistent with the first example and will confuse the model because in the second example you are telling the model that the 'Room > Size' label does not apply when it should.

 

 

Partial labelling


 

Figure 3: Example showing a Verbatim that has been partially labelled.

 

 

In the example above, the user has not applied the label 'Room > Cleanliness' Label, even though it is clearly applicable and has been applied to similar Verbatims elsewhere. This is an example of partial labelling and users should ensure they add all labels that apply to a Verbatim.



Discover


Remember it’s often easier to create a more specific, finer-grained taxonomy in the first instance. If it’s too detailed it’s easy to edit and ‘prune’ your taxonomy at a later date

It’s good to start with labels in a flat hierarchy (not adding too many sub-labels) – you can always restructure the taxonomy to be more hierarchical later
Each verbatim can have multiple labels assigned to it – make sure to apply all relevant labels, otherwise you teach the model not to associate it with the label that you have omitted
It is better to take the time to carefully label now, so that the machine can rapidly and precisely predict labels in future



Explore


The model will start to make predictions with only 3 labelled verbatims, though for it to make accurate predictions, you should label at least 25 verbatims per label. Some will require more than this, it will depend on the complexity of the data, the label, and the consistency with which the labels have been applied
When labelling a verbatim, focus on the mentality of the person who has written it. You should be labelling the content based on the problem they are trying to solve (what is in the Verbatim), not based on how you would solve it (using your own expertise). Re:infer needs to be able to infer from what is in the verbatim what the label should be
In Explore, you should also try and find verbatims where the model has predicted a label incorrectly. You should remove incorrect labels and apply correct ones. This process helps to prevent the model from making a similar incorrect prediction in future.



Refine


Move, rename and merge labels to ensure you’re happy with the hierarchical structure — to do this use the label edit modal in Explore.
If for a label there are multiple different ways of saying the same thing (e.g. A, B or C), make sure that you give Re:infer training examples for each way of saying it. If you give it 30 examples of A, and only a few of B and C, the model will struggle to pick up future examples of B or C for that label.
Adding a new label to mature taxonomy may mean it’s not been applied to previously reviewed verbatims. This then requires going back and teaching the model on new labels, using the 'Teach' function – see here for how