Having the same data repeated multiple times across the data extract
Having the wrong headers aligned to the wrong data fields
|Hanging rows or columns||Not having all the data contained in sequential rows|
Having all comments in Row 1 to 10,000, but having a row with a cell containing data in row 19,999.
|Inconsistent date formatting||Different rows with inconsistent date formats|
Having a number of comments in US date format, and a number of comments in EU date format, all in the same dataset, as this will have issues normalizing downstream.
|Incoherent sentences||These are sentences that contain an assortment of words without a clear syntactic or semantic structure.|
'The user is requesting a new portable 28442 298 ticket to be creaportableted'
|Inconsistent spacing||When there are an irregular number of spaces in between words.|
'The policy is set to renew' instead of 'The policy is set to renew'
|Breaks in words||When there are breaks in the middle of a word, when there shouldn't be.|
'The po licy is set. to renew' instead of 'The policy is set to renew'
|Erroneous character encoding||When text data is not properly encoded, resulting in garbled or unreadable characters.|
'ThÇ åpp is gré¶t' instead of 'The app is great.'
Communications without any content included in the subject/body
Comments with lots of typos
Text data containing lots of errors in spelling
|Headers / footers||When there are headers or footers included|
Spam warnings, virus scan warnings, etc.
|Metadata included in the subject/body instead of as a metadata property||When metadata is included in the subject or body|
'[01/01/2023] I would like to renew my policy' as the body of a message, instead of 'I would like to renew my policy' as the message with 01/01/2023 as the date included in the metadata.
|Multiple messages combined into one message||When there are multiple messages that should have been broken out into separate messages in a thread, combined into a single communication.|