Once your data is in the platform, Re:infer will group and display 30 clusters of communications (verbatims) that it believes share concepts or similar intents. The aim of this part of the training process is to go through each of these clusters and label the data presented in each of them.
This process makes training the model easier and faster to begin with, as you can add labels to multiple similar verbatims at once, as well as adding/removing labels to individual verbatims as required.
Helpful tips for labelling clusters:
- Don’t spend too long thinking about the name of the label. You can rename a label at any point during the training process.
- Be as specific as possible when naming a label and keep the taxonomy as flat as possible initially (don’t add too many child labels). It is better to be as specific as possible with your label name at the outset as you can always change and restructure the hierarchy later. At this stage you should add as many labels as possible to a verbatim as you can always go back and delete them later, which is quicker and easier than expanding an existing label.
- Remember it’s often easier to create a more specific, finer-grained taxonomy in the first instance. If it’s too detailed it’s easy to edit and ‘prune’ your taxonomy later. This means to add more rather than less labels and sub labels
- It’s good to start with labels in a flat hierarchy (not adding too many sub-labels) – you can always restructure the taxonomy to a more hierarchical structure later
- Each verbatim can have multiple labels assigned to it – make sure to apply all relevant labels, otherwise you teach the model not to associate it with the label that you have omitted
- It is better to take the time to carefully label now, so that the machine can rapidly and precisely predict labels in future
- Not all Clusters will have obviously similar intents and it’s ok to move on if they are all different
Help, my Discover is empty!
When you first create a new Dataset you may find that Discover is empty as shown below. Don’t worry, this is simply because Re:infer’s algorithms are busy working in the background to group your verbatims into clusters. Depending on the number of verbatims in the data source this could take up to a few hours to process.
Empty Discover page whilst clusters are being generated
Discover page in 'cluster' mode
- The darker lines indicate more important parts of the span (this is explained when you hover over it)
- The lighter coloured lines indicate a medium and slightly weaker contribution to the cluster
|Please Note: The following guide describes the process for labelling a dataset that does not have sentiment analysis enabled. If you do have sentiment analysis enabled, the process is very similar, you just also select a positive or negative sentiment when applying each label, and you can use neutral label names where the sentiment denotes whether its the positive or negative version of that concept. See here for more details on labelling with sentiment analysis.|
- Review each comment in the cluster
- If you think there is a label that applies to all comments on the page, select ‘Add label’
- Type in the name of the label and hit enter or click the pin button that appears
- Please Note: this does not apply the label yet
- You can add several labels at once this way, just type in another label and click the pin again
- There are two ways to apply the label(s) you have just added:
- Double-click the ‘Apply labels’ button to assign them to the Verbatims
- Alternatively you can add a label to individual Verbatims by clicking the ‘Add label +’ button highlighted underneath it
- As before, click the pin or hit enter and it will apply the label
- If you want to add a label to a group of verbatims on the page, but wish to exclude one or several, you can de-select them using the toggle button highlighted
- You can then invert the selection or de-select / reselect all using the buttons highlighted at the top
- You can view different pages of the same cluster and adjust the number of Verbatims per page using the buttons highlighted
- After you’ve applied the relevant labels to a Verbatim, you’ll notice that the colour will change to a darker shade and an identifier will appear marking it as ‘REVIEWED’.
- Select the drop down menu or click the arrows to move to the next cluster once you’ve assigned enough labels to the first cluster
The model will present you with 30 clusters and it’s important to work your way through each of them using the process described in this article to really help create a solid basis for your taxonomy.
If a cluster isn’t relevant to you, however, just skip over it.
Please Note: Discover begins to retrain after a significant amount of training is completed. After 180 verbatims have been labelled (half of the clusters), Discover will retrain and update the clusters. Don't be put off, just carry on working through them until you've reviewed at least 30.