Training using 'Clusters'

User permissions required: ‘View Sources’ AND ‘Review and label’

Please Note: Users will be able to see verbatims in Discover if they have ‘View Sources’ AND see labels if they have ‘View labels’ permissions, but they will require the ‘Review and label’ permission in order to actually apply labels in Discover.

What's in this article?

Overview
Help, my Discover is empty!
Layout
Discover highlights common themes
Key steps

Overview

Once your data is in the platform, the platform will group and display 30 clusters of communications (verbatims) that it believes share concepts or similar intents. The aim of this part of the training process is to go through each of these clusters and label the data presented in each of them.

This process makes training the model easier and faster to begin with, as you can add labels to multiple similar verbatims at once, as well as adding/removing labels to individual verbatims as required.

Helpful tips for labelling clusters:

Don’t spend too long thinking about the name of the label. You can rename a label at any point during the training process.
Be as specific as possible when naming a label and keep the taxonomy as flat as possible initially (don’t add too many child labels). It is better to be as specific as possible with your label name at the outset as you can always change and restructure the hierarchy later. At this stage you should add as many labels as possible to a verbatim as you can always go back and delete them later, which is quicker and easier than expanding an existing label.
Remember it’s often easier to create a more specific, finer-grained taxonomy in the first instance. If it’s too detailed it’s easy to edit and ‘prune’ your taxonomy later. This means to add more rather than less labels and sub labels
It’s good to start with labels in a flat hierarchy (not adding too many sub-labels) – you can always restructure the taxonomy to a more hierarchical structure later
Each verbatim can have multiple labels assigned to it – make sure to apply all relevant labels, otherwise you teach the model not to associate it with the label that you have omitted
It is better to take the time to carefully label now, so that the machine can rapidly and precisely predict labels in future
Not all Clusters will have obviously similar intents and it’s ok to move on if they are all different

Help, my Discover is empty!

When you first create a new Dataset you may find that Discover is empty as shown below. Don’t worry, this is simply because the platform's algorithms are busy working in the background to group your verbatims into clusters. Depending on the number of verbatims in the data source this could take up to a few hours to process.

Empty Discover page whilst clusters are being generated

Layout

The layout of Discover and an example cluster are shown below. In this example, the platform has detected that these comments share the common theme of the comfort of the hotel beds:

Discover page in 'cluster' mode

Layout explained:

Toggle button to switch between 'Cluster' and 'Search' mode
Dropdown menu that lets you switch between different clusters
Button to apply a label to all of the verbatims shown on the page
One of six verbatims shown from cluster #7 (each cluster contains 12 verbatims)
Button to apply a label to an individual verbatim
Dropdown menu to adjust the number of verbatims shown on the page (between 6 and 12)
Buttons to adjust and invert the selection of verbatims on the page
Button to de-select a verbatim to exclude it from labels added in bulk

Discover highlights common themes

As highlighted in the image below, Discover highlights the parts of a verbatim that most contribute to that verbatim being included in the cluster, helping you identify the common themes quicker:

Discover highlighting common themes

The darker lines indicate more important parts of the span (this is explained when you hover over it)
The lighter coloured lines indicate a medium and slightly weaker contribution to the cluster

Key steps

Please Note: The following guide describes the process for labelling a dataset that does not have sentiment analysis enabled. If you do have sentiment analysis enabled, the process is very similar, you just also select a positive or negative sentiment when applying each label, and you can use neutral label names where the sentiment denotes whether its the positive or negative version of that concept. See here for more details on labelling with sentiment analysis.

1. Review each verbatim in the cluster

2. If you think there is a label that applies to all verbatims on the page, select ‘Add label’

3. Type in the name of the label and hit enter or click the pin button that appears (you can add several labels at once this way, just type in another label and click the pin button again).

Please Note: this does not apply the label yet

4. Click the ‘Apply labels’ button to assign the label(s) to the verbatims. The assigned labels will now appear underneath every verbatim on the page.

Alternatively, you can add a label to individual verbatims by clicking the ‘Add label +’ button highlighted underneath it.

Adding labels to individual verbatims in Discover

If you want to add a label to a group of verbatims on the page, but wish to exclude one or several, you can de-select them using the toggle button highlighted (A). You can then invert the selection or de-select / reselect all using the buttons highlighted at the top (B).

Excluding individual verbatims from bulk selection in Discover

You can view different pages of the same cluster (A) and adjust the number of verbatims per page (B) using the buttons highlighted. Once the cluster is labelled, you can move onto a new cluster using the drop down list below (C).

The model will present you with 30 clusters and it’s important to work your way through them to create a solid basis for the Explore phase. If a cluster isn’t relevant to you, however, just skip over it.

Navigating between clusters and cluster pages in Discover

Please Note: Discover begins to retrain after a significant amount of training is completed. After 180 verbatims have been labelled (half of the clusters), Discover will retrain and update the clusters. Don't be put off, just carry on working through them until you've reviewed at least 30.

Previous: Introduction to Discover | Next: Training using 'Search'

Model Training & Maintenance

Getting Started

Manage Accounts & Access

Model Training & Maintenance

Using Analytics & Monitoring

Automations & Communications Mining

Technical Support

FAQs & More

Model Training & Maintenance

Getting Started

Manage Accounts & Access

Using Analytics & Monitoring

Automations & Communications Mining

Technical Support

FAQs & More

Training using 'Clusters'

Overview

Help, my Discover is empty!

Layout

Discover highlights common themes

Key steps

Sections