Why can I not see anything in Discover if I’ve just uploaded data into the platform?


As soon as data is uploaded to the platform, Re:infer begins a process called unsupervised learning, whereby it groups verbatims into clusters of similar semantic intent. This process can take up to a couple of hours, depending on the size of the dataset, and clusters will appear once it is complete. 


Please Note: Discover requires a minimum of 2048 verbatims in the dataset in order to create clusters. 



Why are the navigation options greyed out when I log into the platform?


The datasets navigation options are greyed out because a dataset has not yet been selected. To select a dataset go to the datasets page from the global action drop-down menu, and either scroll down and select the one you want or search for it in the search bar. Once you have found it you can click on the Dashboard, Explore, or Reports options to select the dataset. See our guide to navigation and the Datasets page for detailed instructions



How do I know what the performance of the platform/model is?


Please check the Validation page in the platform, which reports various performance measures in the GUI. This page updates after every training event and it can be used to identify areas where the model may need more training examples or some label corrections in order to ensure consistency. 

 

Please see our Knowledge Base for information on how to use the Validation page to full effect, and for full explanations of the performance measures displayed in the GUI.

 


Why are there only 30 clusters available and can we set them individually?

The clusters are a really helpful way to help you quickly build a taxonomy in the first instance, but users will spend most of their time training in Explore rather than Discover. 


If users spend too much time labelling via clusters, there’s a risk of overfitting the model to look for verbatims that only fit these clusters when making predictions. The more varied examples there are for each label, the better the model will be at finding the different ways of expressing the same intent or concept. This is one of the main reasons why we only show 30 clusters at a time.

 

Once enough training has been completed or a significant volume of data has been added to the platform (see here), however, Discover does retrain. When it retrains, it takes into account the existing training to-date, and will try to present new clusters that are not well covered by the current taxonomy.



How many verbatims are in each cluster?


There are 30 clusters in total, each containing 12 verbatims. In the GUI, you are able to filter the number of verbatims shown on the page in increments between 6 and 12 per page. 

 

When training a model on a new dataset, we do not recommend labelling more than 12 Verbatims per cluster, so as not to overfit the model to look for only those specific examples of that label. Please see our training best practice guide for further helpful tips.



How many labels should you apply to a verbatim?


You should apply as many labels as you are interested in tracking and that can be logically inferred from the text of the communication. 

 

It is very important to consistently apply all of the labels that should apply to a Verbatim, as every time you label a verbatim, you are not only teaching the model which labels apply, but also which labels do not apply. If you miss out labels you will send mixed training signals to the model, which will impact its performance.



What’s the maximum number of labels I should have in a taxonomy?


There is no maximum limit in the platform, however, as a rule of thumb we do not encourage datasets to have over 150 Labels, as it becomes increasingly difficult for people training the model to apply those labels consistently.



How many training examples do I need to give the platform for each label I have created?


This number varies depending on the complexity and specificity of the intent or concept you are trying to capture with the label. The model will start making predictions from as little as 10 trained examples, though we recommend providing at least 25 varied examples for each label to ensure you start getting accurate predictions. 

The red circle next to a label is a low sample warning displayed in the Explore and Reports pages, which indicates where labels have less than 25 trained examples (and disappears once there are 25 or more).



When can I start seeing predictions?


You need to label a small set of varied examples across multiple labels before the model starts to make predictions.



Does it matter what I actually name the labels?

 

No, the model is agnostic to the specific name you give a label, so you can give it any name that is informative for you and you can use language that may be unique/specific to your organisation.

 


What does precision and recall mean?


Precision and recall are metrics used to measure the performance of a machine learning model. A detailed description of each can be found under the Using Validation section of our how-to guides.



Can I change the name of a label later on?

 

Yes, it’s really easy to do. You can go into the settings for each label and rename it at any point. You can see how to do it here.

 


How do I find out the number of verbatims I have labelled?

 

Information about your dataset, including how many verbatim that have been labelled, is displayed in the Datasets Settings page. To see how to access it, click here.



One of my labels is performing poorly, what can I do to improve it?

 

If you can see in the Validation page that your label is performing poorly, there are various ways to improve its performance. See the Using Validation scores guide to help you understand how.



Previous: Data FAQs