To ensure the model isn’t biased to one way of saying something it is important to provide it with as many different examples and ways of saying the same thing. Let’s look at an example. The cluster below is clearly about the location of the hotel and a ‘Location’ label has been added. If we only used this term it could bias the model towards the phrases around the word ‘Location’ or similar, and we should use the Search feature to find alternative ways of expressing this:
- Hotel position
- Location to transport
- Transport links
- Close to shops
- Tourist attractions
- Close to transport
- Close to airport
- Near the airport
Searching for different terms
The examples below show how searching for alternative terms for ‘location’ highlight Verbatims that are relate to the location of the hotel but expressed differently. You could also apply the label of ‘Location’ to these and by doing this the model will be given different examples of ‘Location’. Doing this provides the model with alternative ways to express the same concept i.e. the location of a hotel.
Search results in Discover
- 'Attractions’ highlights where the reviewer likes how close the hotel is to local attractions
- ‘Hotel Position’ is another term to describe the users like/dislike for the location
- ‘Transport’ shows where the reviewer likes that the hotel is close to transport links
- ‘Close to shops’ similarly highlights where the reviewer likes that the hotel is close to shops
Applying labels to search results
Search results in Discover that have been labelled
- Select ‘Search’ from the ‘Cluster’ drop-down menu in the Discover tab
- Enter your search term and hit enter or click the icon
- Matching search terms will appear highlighted in yellow. Reinfer will show full matches followed by partial matches
- Add all labels that should apply. In the above example the ‘Location > Shops’ labels are added, but also all other relevant ones have been applied.
Repeat this process for as many search terms as you feel could apply to describe the label you are interested in, which is ‘Location’ in this example.
You should repeat for all labels that are known to have different ways of expressing the same topic