A label is a structured summary of an intent or concept expressed within a verbatim. A verbatim is often summarised by multiple labels - i.e. a label isn't a mutually exclusive classification of the verbatim.
As an example, in a dataset monitoring the customer experience we might create a label called ‘Incorrect Invoice Notification’, which describes when a customer is informing the business that they’ve received what they believe is an incorrect invoice.
Pinned vs. Predicted
Labels are initially created by users by applying one to a relevant verbatim. Users can continue to apply them to build up training examples for the model, and Re:infer will then start to automatically predict the label across the dataset where it's relevant.
A label that has been applied by a user to a verbatim is considered 'pinned', whereas those that Re:infer assigns to verbatims are known as label predictions. For more detail, see here to learn about reviewed and unreviewed verbatims.
When Re:infer predicts whether a label applies to a verbatim that has not been reviewed by a user, it provides a confidence level (%) for that label prediction. The higher the confidence level, the more confident Re:infer is that the label applies.
An email from an operations team of a financial services company with multiple labels predicted
Labels are shaded by the confidence level that Re:infer has in the predicted labels. The more opaque the label, the higher Re:infer’s confidence is that the label applies.
Labels can be organised in a hierarchical structure to help you organise and train new concepts more quickly.
This hierarchy takes a format like this: [Parent label] > [Branch label 1] > [Branch label n] > [Child label]
A label can be a standalone parent label, or have branch and child labels (separated by '>') that form subsets of the previous labels in the hierarchy.
Any time a child label or branch label is pinned or predicted, the model considers the previous levels in the hierarchy to have been pinned or predicted too. Predictions for parent labels will typically have higher confidence levels than the lower levels of the hierarchy, as they're often easier to identify.
In both the images above and below, hierarchical labels have been used, e.g. 'Margin Call > Confirmation' and ‘Property > Location’.
To see more about label hierarchies, see here.
For datasets with sentiment analysis enabled, every label (both pinned and predicted) has an associated positive or negative sentiment indicated by a green or red colour (such as the positive sentiment predictions below).
Different levels of a label hierarchy can have different sentiment predictions - e.g. a review could be overall positive about a 'Property' but be negative about the 'Property > Location'.
A hotel review showing multiple label predictions that have positive sentiment