Using Discover is typically the first step in the training process. This is where the platform uses unsupervised learning to read and interpret all of your data and displays 30 different clusters of verbatims that it believes share similar themes, concepts or intents. Unsupervised learning means that no human input is required to create these clusters, the platform does this automatically and independently when the dataset is created.
You can begin building your taxonomy by reviewing and labelling the data presented in these clusters. This feature makes training the model easier and faster to begin with, as it finds natural groups of verbatims that can share labels, and allows you to label multiple verbatims at once (as well as adding labels to individual verbatims as required).
Example cluster in Discover from an insurance underwriting dataset
Alternative search terms
Discover can also be used to search for verbatims containing key words or phrases, which can be useful if you know a relevant common term or expression that has not appeared in any of the clusters, but would indicate that a certain label should apply.
Discover stays useful
After a significant amount of training has been completed or an influx of new data, Discover will search for new clusters to present to you, and in this way, it acts as a useful way for you to continue finding interesting things within your data. This is particularly true if you have a live data integration set up, as new verbatims will continually be added to the dataset and may contain new intents and concepts.