User permissions required: ‘View Sources’ AND ‘Review and label’

 

Please Note: Users will be able to see verbatims in Discover if they have ‘View Sources’ AND see labels if they have ‘View labels’ permissions, but they will require the ‘Review and label’ permission in order to actually apply labels in Discover.


Video guide: Training using clusters

The following video guides you through labelling clusters in Discover, which is typically the very first part of the model training process:


 

Please Note: We're regularly making updates and improvements to the platform, so if you notice anything is slightly different or out of date in the video, don't worry, we'll be updating it in due course!



Overview

Once your data is in the platform, Re:infer will group and display 30 clusters of communications (verbatims) that it believes share concepts or similar intents. The aim of this part of the training process is to go through each of these clusters and label the data presented in each of them. 


This process makes training the model easier and faster to begin with, as you can add labels to multiple similar verbatims at once, as well as adding/removing labels to individual verbatims as required.


Helpful tips for labelling clusters:

  • Don’t spend too long thinking about the name of the label. You can rename a label at any point during the training process. 
  • Be as specific as possible when naming a label and keep the taxonomy as flat as possible initially (don’t add too many child labels). It is better to be as specific as possible with your label name at the outset as you can always change and restructure the hierarchy later. At this stage you should add as many labels as possible to a verbatim as you can always go back and delete them later, which is quicker and easier than expanding an existing label.
  • Remember it’s often easier to create a more specific, finer-grained taxonomy in the first instance. If it’s too detailed it’s easy to edit and ‘prune’ your taxonomy later. This means to add more rather than less labels and sub labels
  • It’s good to start with labels in a flat hierarchy (not adding too many sub-labels) – you can always restructure the taxonomy a more hierarchical structure later
  • Each verbatim can have multiple labels assigned to it – make sure to apply all relevant labels, otherwise you teach the model not to associate it with the label that you have omitted
  • It is better to take the time to carefully label now, so that the machine can rapidly and precisely predict labels in future
  • Not all Clusters will have obviously similar intents and it’s ok to move on if they are all different

 


 

    Help, my Discover is empty!


When you first create a new Dataset you may find that Discover is empty as shown below. Don’t worry, this is simply because Re:infer’s algorithms are busy working in the background to group your verbatims into clusters. Depending on the number of verbatims in the data source this could take up to a few hours to process.

 

 

 Empty Discover page whilst clusters are being generated 

 


Layout

The layout of Discover and an example cluster are shown below. In this example, Re:infer has detected that these comments share the common theme of the comfort of the hotel beds:



Discover page in 'cluster' mode


Layout explained:

  1. Toggle button to switch between 'Cluster' and 'Search' modes

  2. Dropdown menu that lets you switch between different clusters

  3. Button to apply a label to all of the verbatims shown on the page

  4. One of six verbatims shown from cluster #7 (each cluster contains 12 verbatims)

  5. Button to apply a label to an individual verbatim

  6. Dropdown menu to adjust the number of verbatims shown on the page (between 6 and 12)

  7. Buttons to adjust and invert the selection of verbatims on the page

  8. Button to de-select a verbatim to exclude it from labels added in bulk


Discover highlights common themes

As highlighted in the image below, Discover highlights the parts of a verbatim that most contribute to that verbatim being included in the cluster, helping you identify the common themes quicker:

 


  1. The darker lines indicate more important parts of the span (this is explained when you hover over it)
  2. The lighter coloured lines indicate a medium and slightly weaker contribution to the cluster

 


Key steps 


Please Note: The following guide describes the process for labelling a dataset that does not have sentiment analysis enabled. If you do have sentiment analysis enabled, the process is very similar, you just also select a positive or negative sentiment when applying each label, and you can use neutral label names where the sentiment denotes whether its the positive or negative version of that concept. See here for more details on labelling with sentiment analysis.



  1. Review each comment in the cluster 
  2. If you think there is a label that applies to all comments on the page, select ‘Add label’
  3. Type in the name of the label and hit enter or click the pin button that appears 
    • Please Note: this does not apply the label yet
  4. You can add several labels at once this way, just type in another label and click the pin again

 



  1. There are two ways to apply the  label(s) you have just added:
    • Double-click the ‘Apply labels’ button to assign them to the Verbatims
    • Alternatively you can add a label to individual Verbatims by clicking the ‘Add label +’ button highlighted underneath it
  2. As before, click the pin or hit enter and it will apply the label
  3. If you want to add a label to a group of verbatims on the page, but wish to exclude one or several, you can de-select them using the toggle button highlighted 
  4. You can then invert the selection or de-select / reselect all using the buttons highlighted at the top
  5. You can view different pages of the same cluster and adjust the number of Verbatims per page using the buttons highlighted

 


 

 

  1. After you’ve applied the relevant labels to a Verbatim, you’ll notice that the colour will change to a darker shade and an identifier will appear marking it as ‘REVIEWED’.
  2. Select the drop down menu or click the arrows to move to the next cluster once you’ve assigned enough labels to the first cluster


The model will present you with 30 clusters and it’s important to work your way through each of them using the process described in this article to really help create a solid basis for your taxonomy.


If a cluster isn’t relevant to you, however, just skip over it.


Please Note: Discover begins to retrain after a significant amount of training is completed. After 180 verbatims have been labelled (half of the clusters), Discover will retrain and update the clusters. Don't be put off, just carry on working through them until you've labelled at least 30.

 


Previous: Intro to Discover    |     Next: Training using search