User permissions required: ‘Datasets admin’


To create a new dataset:

Navigate to the datasets page and click 'New Dataset' which reveals a modal to create the new dataset.

 

New Dataset modal

 

Complete the form with all the relevant information, clicking continue to progress through each step:

  1. Give your dataset a name
    • Use the dropdown menu where it says 'demo' in this example to select the owner (organisation), which controls who has access to the dataset (you can assign the dataset to any of the organisations that you are a member of)
    • Give the dataset a useful, descriptive name, using hyphens instead of spaces - e.g. zendesk-cs-chats
  2. Provide some additional information about your dataset
    • Use the title and description boxes to provide more information on the dataset you’re creating. It’s good practice to reference the data sources and the purpose of the analysis
    • These fields are not mandatory, but are helpful to make your dataset more easily identifiable
  3. Copy in an existing taxonomy, customise sources and entities
    • Select whether you would like to copy an existing taxonomy from another dataset (this will auto-select the same sources, entities, and sentiment selection as that dataset) 
    • Select all the (additional) sources which you wish to connect to the dataset
    • Select any entities that you wish to enable (you do not have to enable any, and you can always enable them later in the dataset settings page)
  4. Set the sentiment and language of the dataset
    • Enable or disable sentiment analysis - with sentiment analysis enabled every label in the taxonomy has an associated positive or negative sentiment, to understand why you would or wouldn't enable it, see here
    • Confirm the model family (language)

Lastly, click 'Create Dataset'.


Please Note:

  • You can add up to 20 individual sources to a dataset in the GUI
  • Sources can sit in a different organisation to a dataset. As long as users have the appropriate permissions in each organisation, they will be able to see the verbatims and label as usual
  • If there are multiple sources in a dataset, they should share a similar intended purpose for your analysis 

Previous: Uploading a CSV file into a source     |     Next: Enabling sentiment on a dataset