Support Centre

Model Training & Maintenance

Guides on how to create, improve and maintain Models in Re:infer, using platform features such as Discover, Explore and Validation

Uploading a CSV file into a source

User permissions required: 'Sources admin' AND 'Edit verbatims' 

Please Note: this article demonstrates how to upload data from a CSV file into an existing data source. To understand how to first create a data source via the GUI, see here.


Key steps

To upload data from a CSV file into a data source, navigate to the Sources page and locate the source you'd like to upload data into.

 

Click the three dots in the top right-hand corner of the data source card and select 'Upload CSV' (as shown below).


Data source card

 

Then click 'Select file' and choose the CSV file you wish to upload. 

 

The selected file must meet the following criteria:

  • The file should contain headers on the first line and be delimited by commas or tabs
  • A minimum of three columns are required: the comment text contents (the message), a timestamp, and a unique ID that identifies the comment
  • The file must be encoded as either UTF-8, UTF-16, or UTF-32 (Re:infer automatically detects which one)
  • The CSV file should be 64 MiB or less. If you have a larger file, you can still upload it by splitting it into multiple files, each less than 64 MiB


CSV upload page - Step 1


If your file meets the above criteria, you can then configure and upload the comments in the next step:


CSV upload page - Step 2

 

Select the required columns from each of the dropdown lists containing the column headers detected within the CSV file:

  • ID column:
    • This must be a column containing a unique ID that can identify the comment
    • The comment IDs can only contain ASCII alphanumeric characters (A-Z a-z 0-9) and punctuation (except /)
    • Please Note: If there are existing comments in the source with the same ID, they will be updated to match the contents of the new file
  • Message column:
    • This is simply the column that contains the message text that you want to analyse in the platform
  • Timestamp column:
    • This is column containing the date and time the comment was recorded
    • The timestamp format is flexible and will be inferred automatically by the platform


Timestamp format:

  • If your chosen timestamp format is ambiguous for the order of days / months / years (e.g. 01/02/03 10:10), you can suggest the correct interpretation:
    • 2nd of January 2003 - None
    • 1st of February 2003 - Day first
    • 3rd of February 2001 - Year first
    • 2nd of March 2001 - Day first + Year first
  • To avoid ambiguity, it is recommend to supply timestamps in the RFC 3339 format if possible (e.g. 2020-01-31T12:34:56Z for UTC or with a timezone: 2020-08-031T11:20:60-08:00)


Then select the additional user properties you want to upload with the comments. User properties are contextual metadata associated with each verbatim that are filterable in the platform. These are also potentially used by the machine learning models in Re:infer. There are two types, either string or number:

  • String user properties are categorical metadata (typical examples include IDs, countries, counterparties, etc.)
  • Number user properties are numeric metadata (typical examples include NPS, email statistics, amounts, etc.)
    • Please Note: If your file contains an NPS score as a user property, this must be included as a number property and named 'NPS' only, in order to trigger native NPS charts to load in the platform


Once you've selected all of the user properties, click 'Upload'.


You'll then be prompted to inspect the uploaded comments in a dataset that contains the source you uploaded data into. If the source is not associated with any datasets yet, you can create a new one to check that the upload is as expected.


Please Note: If you made a mistake when selecting the user properties you can re-upload the same file, and the platform will use the column ID as the identifier to overwrite the existing comments and properties (this will not affect any labels applied to existing comments).


Troubleshooting

Hopefully your upload will run smoothly, but it's possible that you'll encounter an issue during the upload process and see an error message. We've outlined some of them below and why they occur, to help you resolve or avoid them.


In the error messages below, {something} maps to contextual information about where the error occurred. Additionally, the way we refer to a position in the file is standardised as:


StringExpands to:
{position}
record {row-number} on line {line-number} column {column-number} (byte {byte-number})


The title of the error message is displayed along with a description, as shown below: 

  

Here are some possible error messages users may encounter when uploading CSV files: 


Error KindError MessageDescription
Not Enough ColumnsThe CSV file only contains {number-columns} column(s), but at least 3 are needed (text, timestamp and id)The uploaded CSV doesn't contain at least 3 columns or Re:infer has mis-detected the encoding of the file. If your CSV has at least three columns, raise a support ticket or contact support@reinfer.io
Invalid EncodingThe file contains invalid characters (encoding detected as {detected-encoding})The file is not correctly encoded as UTF-8 / UTF-16 / UTF-32 (Re:infer automatically detects the format of the file)
Invalid Headerstring:ti:er' does not match
'(^delimiter|id|message|timestamp |timestamp_default_utc_offset |timestamp_day_first|timestamp_year_first\\Z)|(^(?P<property_type>number|string):(?P<name>\\w(?:[\\w
]{0,30}\\w)?)\\Z)'
If a column header is an invalid name for a user property, Re:infer returns the default message for when the schema of a request is invalid. Check that each column header is a valid format for its purpose. Max length for a column header is 32 alphanumeric characters
Unequal Row LengthsThe CSV contains unequal row lengths. Comment {position} has {number} fields, but the previous record has {number} fields.The CSV contains rows with different numbers of cells in them or that are inconsistent with the number of headers.
Id formatInvalid comment id for {record}. Ids can only consist of ASCII alphanumeric characters and punctuation (except '/'). Cell value: {cell-value}This error occurs when an Id field consists of invalid characters as described in the error message.
Id lengthId is too long for comment {record}. It has {number} bytes, expected at most 1024This error occurs when an id field is longer than the maximum allowed length (1024 characters)
Timestamp FormatIncorrectly formatted timestamp in comment {position}: {timestamp-error-message}. Cell value: {cell-value}This error occurs when a timestamp field could not be parsed.
Message LengthMessage is too long for comment {position}. It has {number} bytes, expected at most 65536This error occurs when a message field is longer than the maximum allowed length (65536 characters).
Number Property FormatIncorrectly formatted number in comment {position}: {number-error-message}. Cell value: {cell-value}This error occurs when a number user property field could not be parsed. Re:infer should allow any format that can reasonably be decoded as a number.
Property LengthProperty is too long for comment {position}. It has {number} bytes, expected at most 4096This error occurs when a user property field is longer than the maximum allowed length (4096 characters).
Unknown ErrorUnknown CSV error: {underlying-error-message}The above list is not completely exhaustive - if an unknown error occurs, retry the upload and if it persists, raise a support ticket or contact support@reinfer.io

 

If you're having persistent issues with uploading a CSV file, please raise a support ticket via this site or email support@reinfer.io.


Previous: Create a data source in the GUI    |     Next: Create a new dataset

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.

Sections

View all