Once your data is in the platform, the platform will group and display 30 clusters of communications (verbatims) that it believes share concepts or similar intents. The aim of this part of the training process is to go through each of these clusters and label the data presented in each of them.
This process makes training the model easier and faster to begin with, as you can add labels to multiple similar verbatims at once, as well as adding/removing labels to individual verbatims as required.
Helpful tips for labelling clusters:
- Don’t spend too long thinking about the name of the label. You can rename a label at any point during the training process.
- Be as specific as possible when naming a label and keep the taxonomy as flat as possible initially (don’t add too many child labels). It is better to be as specific as possible with your label name at the outset as you can always change and restructure the hierarchy later. At this stage you should add as many labels as possible to a verbatim as you can always go back and delete them later, which is quicker and easier than expanding an existing label.
- Remember it’s often easier to create a more specific, finer-grained taxonomy in the first instance. If it’s too detailed it’s easy to edit and ‘prune’ your taxonomy later. This means to add more rather than less labels and sub labels
- It’s good to start with labels in a flat hierarchy (not adding too many sub-labels) – you can always restructure the taxonomy to a more hierarchical structure later
- Each verbatim can have multiple labels assigned to it – make sure to apply all relevant labels, otherwise you teach the model not to associate it with the label that you have omitted
- It is better to take the time to carefully label now, so that the machine can rapidly and precisely predict labels in future
- Not all Clusters will have obviously similar intents and it’s ok to move on if they are all different
Help, my Discover is empty!
When you first create a new Dataset you may find that Discover is empty as shown below. Don’t worry, this is simply because the platform's algorithms are busy working in the background to group your verbatims into clusters. Depending on the number of verbatims in the data source this could take up to a few hours to process.
Empty Discover page whilst clusters are being generated
Discover page in 'cluster' mode
Discover highlighting common themes
- The darker lines indicate more important parts of the span (this is explained when you hover over it)
- The lighter coloured lines indicate a medium and slightly weaker contribution to the cluster
|Please Note: The following guide describes the process for labelling a dataset that does not have sentiment analysis enabled. If you do have sentiment analysis enabled, the process is very similar, you just also select a positive or negative sentiment when applying each label, and you can use neutral label names where the sentiment denotes whether its the positive or negative version of that concept. See here for more details on labelling with sentiment analysis.|
1. Review each verbatim in the cluster
2. If you think there is a label that applies to all verbatims on the page, select ‘Add label’
3. Type in the name of the label and hit enter or click the pin button that appears (you can add several labels at once this way, just type in another label and click the pin button again).
Please Note: this does not apply the label yet
4. Click the ‘Apply labels’ button to assign the label(s) to the verbatims. The assigned labels will now appear underneath every verbatim on the page.
Alternatively, you can add a label to individual verbatims by clicking the ‘Add label +’ button highlighted underneath it.
Adding labels to individual verbatims in Discover
If you want to add a label to a group of verbatims on the page, but wish to exclude one or several, you can de-select them using the toggle button highlighted (A). You can then invert the selection or de-select / reselect all using the buttons highlighted at the top (B).
Excluding individual verbatims from bulk selection in Discover
You can view different pages of the same cluster (A) and adjust the number of verbatims per page (B) using the buttons highlighted. Once the cluster is labelled, you can move onto a new cluster using the drop down list below (C).
The model will present you with 30 clusters and it’s important to work your way through them to create a solid basis for the Explore phase. If a cluster isn’t relevant to you, however, just skip over it.
Navigating between clusters and cluster pages in Discover
Please Note: Discover begins to retrain after a significant amount of training is completed. After 180 verbatims have been labelled (half of the clusters), Discover will retrain and update the clusters. Don't be put off, just carry on working through them until you've reviewed at least 30.