PLEASE NOTE: UiPath Communications Mining's Knowledge Base has been fully migrated to UiPath Docs. Please navigate to equivalent articles in UiPath Docs (here) for up to date guidance, as this site will no longer be updated and maintained.

Knowledge Base

Model Training & Maintenance

Guides on how to create, improve and maintain Models in Communications Mining, using platform features such as Discover, Explore and Validation

Uploading a CSV file into a source

User permissions required: 'Sources admin' AND 'Edit verbatims' 

Please Note: this article demonstrates how to upload data from a CSV file into an existing data source. To understand how to first create a data source via the GUI, see here.



Key steps

 

Please Note: If updating existing comments in a source, changing comment properties (e.g. comment text, sent_at timestamp and 'to' or 'from') other than user properties, will cause entity annotations in associated datasets to be lost. It's highly recommended to pin the latest model version in associated datasets before doing so.


To upload data from a CSV file into a data source, navigate to the Sources page (via the admin console, accessed via the cog in the top right of your page) and locate the source you'd like to upload data into.

 

Click the upload icon in the top right-hand corner of the data source card (as shown below).


 

Data source card

 

Then click 'Select file' and choose the CSV file you wish to upload. 

 

The selected file must meet the following criteria:

 

  • The file should contain headers on the first line and be delimited by commas or tabs
  • A minimum of three columns are required: the comment text contents (the message), a timestamp, and a unique ID that identifies the comment
  • The file must be encoded as either UTF-8, UTF-16, or UTF-32 (the platform automatically detects which one)
  • The CSV file should be 64 MiB or less. If you have a larger file, you can still upload it by splitting it into multiple files, each less than 64 MiB


 

CSV upload page - Step 1


If your file meets the above criteria, you can then configure and upload the comments in the next step:


 

CSV upload page - Step 2

 

Select the required columns from each of the dropdown lists containing the column headers detected within the CSV file:

 

  • ID column:
    • This must be a column containing a unique ID that can identify the comment
    • The comment IDs can only contain ASCII alphanumeric characters (A-Z a-z 0-9) and punctuation (except /)
    • Please Note: If there are existing comments in the source with the same ID, they will be updated to match the contents of the new file
  • Message column:
    • This is simply the column that contains the message text that you want to analyse in the platform
  • Timestamp column:
    • This is column containing the date and time the comment was recorded
    • The timestamp format is flexible and will be inferred automatically by the platform


If you have data containing subject lines, threads, or participants (typically seen in cases or email threads), you can also upload these additional columns within your CSV file:  

 

  • Subject Column
    • Choose which column contains the comment Subject 
  • Sender Column
    • Choose which column contains the Sender 
  • To Column
    • Choose which column contains the Recipient(s). Multiple recipients should be semicolon separated. 
  • Cc Column
    • Choose which column contains the Cc'd Recipient(s). Multiple recipients should be semicolon separated
  • Thread ID Column 
    • Choose the column that contains the comment Thread ID
    • A thread ID is what ties together different messages to the same thread


Sender/To/CC format:

 

  • The following conditions in the sender/to/cc fields will trigger errors:   
    • Exceeds maximum number of recipients (max 2048 recipients per thread) 
    • Sender or recipient exceeds maximum character limit (max 512 characters per recipient) 
    • Two or more semicolons are found in a row (e.g. - the following is incorrectly formatted: john@email.com ; ; ; beth@email.com
  • Although the platform will strip out any white space before or after a recipient, it will not do any additional data cleansing. 
  • The platform will delimit the different recipients by the semicolon (;) 
  • Before uploading your data, please ensure the emails are formatted in an appropriate format
  • Please note that in a typical threaded use case (e.g.: emails), there should only be 1 sender in each 'sender' cell 


Timestamp format:

 

  • If your chosen timestamp format is ambiguous for the order of days / months / years (e.g. 01/02/03 10:10), you can suggest the correct interpretation:
    • 2nd of January 2003 - None
    • 1st of February 2003 - Day first
    • 3rd of February 2001 - Year first
    • 2nd of March 2001 - Day first + Year first
  • To avoid ambiguity, it is recommend to supply timestamps in the RFC 3339 format if possible (e.g. 2020-01-31T12:34:56Z for UTC or with a timezone: 2020-08-031T11:20:60-08:00)


Then select the additional user properties you want to upload with the comments. User properties are contextual metadata associated with each verbatim that are filterable in the platform. These are also potentially used by the machine learning models in the platform. There are two types, either string or number:

 

  • String user properties are categorical metadata (typical examples include IDs, countries, counterparties, etc.)
  • Number user properties are numeric metadata (typical examples include NPS, email statistics, amounts, etc.)

Please Note: If your file contains an NPS score as a user property, this must be included as a number property and named 'NPS' only, in order to trigger native NPS charts to load in the platform.


Once you've selected all of the user properties, click 'Upload'.


You'll then be prompted to inspect the uploaded comments in a dataset that contains the source you uploaded data into. If the source is not associated with any datasets yet, you can create a new one to check that the upload is as expected.


Please Note: If you made a mistake when selecting the user properties you can re-upload the same file, and the platform will use the column ID as the identifier to overwrite the existing comments and properties (this will not affect any labels applied to existing comments).


Troubleshooting

 

Hopefully your upload will run smoothly, but it's possible that you'll encounter an issue during the upload process and see an error message. We've outline some of them below and why they occur, to help you resolve or avoid them.

 

In the error messages below, {something} maps to contextual information about where the error occurred. Additionally, the way we refer to a position in the file is standardised as:


StringExpands to:
{position}
record {row-number} on line {line-number} column {column-number} (byte {byte-number})


The title of the error message is displayed along with a description, as shown below: 

 

  

Here are some possible error messages users may encounter when uploading CSV files: 

 

Error KindError MessageDescription
Not Enough ColumnsThe CSV file only contains {number-columns} column(s), but at least 3 are needed (text, timestamp and id)The uploaded CSV doesn't contain at least 3 columns or the platform has mis-detected the encoding of the file. 
Invalid EncodingThe file contains invalid characters (encoding detected as {detected-encoding})The file is not correctly encoded as UTF-8 / UTF-16 / UTF-32 (the platform automatically detects the format of the file)
Invalid Headerstring:ti:er' does not match
'(^delimiter|id|message|timestamp |timestamp_default_utc_offset |timestamp_day_first|timestamp_year_first\\Z)|(^(?P<property_type>number|string):(?P<name>\\w(?:[\\w
]{0,30}\\w)?)\\Z)'
If a column header is an invalid name for a user property, the platform returns the default message for when the schema of a request is invalid. Check that each column header is a valid format for its purpose. Max length for a column header is 32 alphanumeric characters
Unequal Row LengthsThe CSV contains unequal row lengths. Comment {position} has {number} fields, but the previous record has {number} fields.The CSV contains rows with different numbers of cells in them or that are inconsistent with the number of headers.
Id formatInvalid comment id for {record}. Ids can only consist of ASCII alphanumeric characters and punctuation (except '/'). Cell value: {cell-value}This error occurs when an Id field consists of invalid characters as described in the error message.
Id lengthId is too long for comment {record}. It has {number} bytes, expected at most 1024This error occurs when an id field is longer than the maximum allowed length (1024 characters)
Timestamp FormatIncorrectly formatted timestamp in comment {position}: {timestamp-error-message}. Cell value: {cell-value}This error occurs when a timestamp field could not be parsed.
Message LengthMessage is too long for comment {position}. It has {number} bytes, expected at most 65536This error occurs when a message field is longer than the maximum allowed length (65536 characters).
Number Property FormatIncorrectly formatted number in comment {position}: {number-error-message}. Cell value: {cell-value}This error occurs when a number user property field could not be parsed. The platform should allow any format that can reasonably be decoded as a number.
Property LengthProperty is too long for comment {position}. It has {number} bytes, expected at most 4096This error occurs when a user property field is longer than the maximum allowed length (4096 characters).
Unknown ErrorUnknown CSV error: {underlying-error-message}The above list is not completely exhaustive - if an unknown error occurs, retry the upload.



Previous: Create a data source in the GUI    |     Next: Create a new dataset

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.

Sections

View all