User permissions required: ‘View Sources’ AND ‘Review and label’
Please Note: Users will be able to see Verbatims in Discover if they have ‘View Sources’ AND see Labels if they have ‘View Labels’ permissions, but they will require the ‘Review and label’ permission in order to actually apply Labels in Discover.
Training using Clusters
The following video goes through Step 1 of the Discover training process, which is Clusters.
Using Discover is generally the first step in the training process. This is where Re:infer uses unsupervised learning to read and interpret all of your data and displays 30 clusters of Verbatims that it believes share similar concepts or intents. You can begin building your taxonomy by reviewing and labelling the data presented in these clusters.
This process makes training the Model easier and faster to begin with, as you can add Labels to multiple similar Verbatims at once, as well as adding/removing Labels to individual Verbatims as required. You can also use it to search by key terms, which can be useful if you know a relevant common term or expression which has not appeared in any of the clusters.
After a significant period of training or influx of data, Discover will then search for new clusters to present to you, and in this way, it acts as a useful way for you to continue finding interesting things within your data. This is particularly true if you have a live Data integration set up, as new Verbatims will continually be added to the Dataset and may contain new intents and concepts.
1. Training using Clusters
The Model clusters groups of Verbatims that it believes share intents or concepts. When you navigate to the Discover page, you’ll be presented with cluster number one.
- You can add a label that you think applies to all of the Verbatims on the page using the button shown. Just type it in and click the pin button that appears
- You can apply several Labels at once this way, just type in another Label and click the pin again
Add bulk Label
- After adding all the relevant labels that apply, double-click the button to assign them to the Verbatims
- Alternatively, you can add a label to individual Verbatims by clicking the button below the Verbatim itself
- If you want to add a label to a group of Verbatims on the page, but you want to exclude one or several, you can de-select them using thetoggle button
- You can then invert the selection or unselect / reselect all using these buttons respectively:
- You can view different pages of the same cluster and adjust the number of Verbatims per page using the buttons shown below:
Cluster navigation menu
- In Discover, you should typically only label ca. 15 Verbatims per cluster. Over-reliance on Discover does not create good Models; as the clusters presented are so similar, you can ‘overfit’. You need to use Explore and Teach later on to give the Model more varied examples.
- After you’ve applied the relevant labels to a Verbatim, you’ll notice that the colour will change to a darker shade and an identifier will appear marking it as ‘REVIEWED’
- Please Note: It’s important when you label a Verbatim that you add ALL of the Labels that are relevant to it. If you label a Verbatim and do not apply certain relevant Labels, you send a signal to the Model that the Labels you have omitted should not be associated with this Verbatim.
- Once you’ve assigned enough labels to the first cluster, you can then select the next cluster using the dropdown menu or the arrow buttons, as shown below:
Change Cluster menu
- The model will present you with 30 clusters and it’s important to work your way through each of them using the process described to really help create a solid basis for your Taxonomy. That being said, if a cluster isn’t relevant to you, just skip over it
2. Using Search
- Another really useful way of using Discover is to switch from the Cluster mode to Search and to search for key terms that you know would logically indicate that certain labels should apply when these terms or phrases appear within a Verbatim
- Re:infer will search within the Dataset for Verbatims containing the search terms and present them on the page like with a cluster
- Like with the clusters, you can then add labels to multiple Verbatims at once where this applies
Change Discover function menu
3. Discover stays useful and keeps looking for new things
Discover is designed to keeping searching for and presenting new clusters to users, after a significant period of training has occurred or a significant amount of data is added to the Dataset.
Discover will retrain and present new clusters when any of the following conditions are met:
- 1000 Verbatims added or updated (either new Verbatims or editing the text via the API) + at least 1 label added to a Verbatim
- Additional 1% of Verbatims (if 1% is less than 1000) in the Dataset added or updated (same definition as above) + at least 1 label added to a Verbatim
- 180 Verbatims with updated Labels or Entities
- 1% of Verbatims in the Dataset with updated Labels or Entities (if 1% is less than 180)
When Discover retrains and presents you with new clusters, it takes into account the existing Taxonomy and the training that you've completed so far, as well as looking for semantic similarities in the Verbatims.
It then tries to find clusters which are not well covered by the existing Taxonomy (i.e. Verbatims that don't have lots of strong predictions), in order to try and show you new and interesting things you might want to capture with a new or existing Label.
It's worth caveating that there will always be a balancing act between finding strong Clusters with clear connections between the Verbatims, and finding ones that aren't well covered by the Taxonomy. So you'll likely still see some Clusters with strong predictions. Particularly as the more extensively you train the Dataset, the harder it will be for Discover to find Clusters that don't have strong predictions.
- Remember it’s often easier to create a more specific, finer-grained Taxonomy in the first instance. If it’s too detailed it’s easy to edit and ‘prune’ your Taxonomy at a later date
- It’s good to start with labels in a flat hierarchy (not adding too many sub-labels) – you can always restructure the Taxonomy to be more hierarchical later
- Each Verbatim can have multiple labels assigned to it – make sure to apply all relevant labels, otherwise you teach the model not to associate it with the label that you have omitted
- It is better to take the time to carefully label now, so that the machine can rapidly and precisely predict Labels in future
Help, my Discover is empty!
When you first create a new Dataset, you may find that Discover is empty. Don’t worry, this is simply because Re:infer’s algorithms are busy grouping your Verbatims into themes. Depending on the number of Verbatims in the Data Source this could take up to a few hours to process.