Tools Features Pricing Blog
Log in Start for free
← All articles

The Complete CSV Cleaning Checklist Before Any Data Import

A practical checklist covering every common CSV data quality problem — inconsistent values, date formats, encoding issues, duplicates, whitespace, and more — before you import into any system.

Most data import problems come down to the same handful of issues. They’re not random — they’re predictable, and they show up every time data moves between systems.

This is a checklist of everything worth checking before you import a CSV into any CRM, database, or tool. Work through it once on a new file format, save the rules, and you won’t have to think about most of it again.

1. Inconsistent picklist values

This is the most common problem, and the hardest to spot visually. The same concept appears as multiple strings in the same column — different capitalisation, different spelling, different language, or slight variations that look identical in a spreadsheet but aren’t.

Examples that break CRM imports:

  • Technology, tech, TECH, Technologie, IT — five ways to say the same thing
  • Closed Won, closed won, ClosedWon, Won, Closed - Won
  • CEO, Chief Executive Officer, C.E.O., ceo

What to do: For each picklist column, define the canonical list of allowed values and map every variation to the right one. Every system that validates picklist values on import — HubSpot, Salesforce, Airtable — will silently blank out or reject any value that doesn’t match exactly.

The hardest part is that a single column might have 20 variations of 5 valid values spread across thousands of rows. Find & replace gets you partway there, but you need to catch every variation — including typos you haven’t seen yet.

2. Mixed date formats

Date fields are broken in almost every cross-system export, because there’s no universal standard that everyone uses in practice.

Formats commonly found in the same file:

FormatExample
DD/MM/YYYY15/01/2024
MM/DD/YYYY01/15/2024
YYYY-MM-DD (ISO 8601)2024-01-15
Human-readableJanuary 15, 2024 / Jan 15 2024
ISO datetime2024-01-15T00:00:00Z
Excel serial number45306

The ambiguous ones are the worst: 01/05/2024 could be January 5th or May 1st depending on who filled in the spreadsheet. When data comes from multiple people or multiple systems, both interpretations can appear in the same column.

What to do: Convert everything to YYYY-MM-DD before importing. It’s unambiguous and accepted by virtually every system. In Excel: =TEXT(A1,"YYYY-MM-DD"). In Google Sheets: same formula works for cells it recognises as dates. For cells Excel has stored as text, you’ll need to handle each format pattern separately.

Watch out for Excel serial numbers — if someone saved the CSV from Excel without formatting the date column, you may end up with integers like 45306 instead of dates. These need to be converted back before import.

3. Column naming inconsistencies

When data comes from multiple sources, the same field has a different name in every export. COMPANY_NAME, Company, company_name, Account, Organisation — all meaning the same thing, all requiring a different mapping.

This matters because:

  • Every system you import into has its own expected field names
  • Any column the importer can’t match is either skipped or flagged for manual mapping
  • Manual mapping in import wizards is a one-off — you redo it every time the same file arrives

What to do: Standardise column names to match your target system’s schema before uploading. This eliminates the manual mapping step and removes the risk of accidentally skipping a column. Keep a simple map of source column → target field name for each recurring file format.

4. Character encoding issues

When a CSV opens in Excel on Windows, it’s usually saved as Windows-1252 or Latin-1 encoding. When the destination system expects UTF-8, any character outside basic ASCII — accented letters (é, ü, ñ), em dashes, curly quotes, non-Latin characters — will appear as garbled symbols or be dropped entirely.

This is especially common with:

  • Names from non-English-speaking countries (Müller, García, Lefèvre)
  • Company names with special characters
  • Notes fields where users type freely

What to do: Always export and save CSVs as UTF-8. In Excel, use “Save As” → “CSV UTF-8 (Comma delimited)”. In Google Sheets, exports are UTF-8 by default. If you’ve received a file and aren’t sure of the encoding, open it in a text editor and check whether special characters look correct — if they show as é instead of é, the file is Latin-1 and needs re-encoding before import.

5. Leading and trailing whitespace

A cell containing "Acme Corp " (with a trailing space) is not the same as "Acme Corp" — most systems treat them as distinct values. This causes:

  • Picklist validation failures: "Open " doesn’t match "Open"
  • Deduplication failures: the same company appears twice because one record has a trailing space
  • Lookup failures in formulas and filters

Whitespace is invisible in most spreadsheet views and easy to miss.

What to do: Strip leading and trailing whitespace from every column before importing. In Excel: =TRIM(A1). In Google Sheets: same. Most data processing tools have a built-in trim step — use it on every text column.

6. Duplicate rows

Duplicates creep in when merging exports from multiple sources, or when the same export is pulled twice. A contact who appears in both a HubSpot export and a Salesforce export will import twice, creating a duplicate record in the destination system.

Deduplication logic varies by system — some merge on email, some on a primary key field — but it’s always safer to deduplicate before importing rather than relying on the destination to handle it.

What to do: After merging files, deduplicate on the field that uniquely identifies a record: email for contacts, domain for companies, a deal ID for deals. In Excel: Data → Remove Duplicates. Be intentional about which row to keep when there are conflicts — usually the most recently updated one.

7. Number formatting

Number columns break when locale conventions conflict. European formats use a comma as the decimal separator and a period as the thousands separator (1.234,56). US/international formats do the opposite (1,234.56). When you open a European CSV in a US-locale Excel, 1.234,56 gets treated as text, not a number.

The same issue applies to currency symbols ($1,200 vs 1200), percentages (85% vs 0.85), and phone numbers that get auto-formatted by Excel into something like +3.36E+10.

What to do: Strip formatting characters (currency symbols, commas used as thousands separators) and convert to plain numbers before importing. For phone numbers specifically, format them consistently to E.164 international format (+33612345678) or whatever format your target system expects — and treat the column as text to prevent Excel from reformatting it.

8. Empty and null values

Different systems and different people represent “no value” differently: empty string, NULL, N/A, n/a, None, 0, -. Some destination systems treat these differently — 0 in a numeric field is not the same as blank, and N/A in a picklist field will fail validation.

What to do: Decide what “empty” means for each column and normalise it. For picklist fields, empty cells are usually safer than N/A or None — the field just won’t be set. For numeric fields, make sure zeros are intentional and not stand-ins for “unknown”.

Quick reference checklist

  • Picklist columns: all values normalised to the canonical allowed list
  • Date columns: all dates converted to YYYY-MM-DD
  • Column names: renamed to match the target system’s expected schema
  • Encoding: file saved as UTF-8
  • Whitespace: stripped from all text fields
  • Duplicates: removed before merging or importing
  • Numbers: formatted as plain numerics, currency symbols removed
  • Phone numbers: consistent format, stored as text
  • Empty values: null representations standardised per column

None of these are hard to fix individually. The difficulty is doing all of them reliably, on every file, without missing anything — especially when the same file format arrives on a recurring schedule and needs to be cleaned the same way each time.

If you’re doing this manually for every import, the CSV Normalizer handles the picklist normalisation, date conversion, and column renaming steps automatically, and saves the mapping so recurring files take seconds instead of an hour.

Stop fixing the same CSV problems every week

Asphorem maps your columns, standardises picklist values, and normalises dates — so your next import works first time. Free plan included.

Start for free →