The Complete CSV Cleaning Checklist Before Any Data Import

Most data import problems come down to the same handful of issues. They’re not random, they’re predictable, and they show up every time data moves between systems.

This is a checklist of everything worth checking before you import a CSV into any CRM, database, or tool. Work through it once on a new file format, save the rules, and you won’t have to think about most of it again. For the broader map of which fix applies to which CRM, see the complete guide to CSV data cleaning for CRM imports.

1. Inconsistent picklist values

The most common problem, and the hardest to spot visually. The same concept appears as multiple strings in the same column: different capitalisation, different spelling, different language, or slight variations that look identical in a spreadsheet but aren’t.

Examples that break CRM imports:

Technology, tech, TECH, Technologie, IT: five ways to say the same thing
Closed Won, closed won, ClosedWon, Won, Closed - Won
CEO, Chief Executive Officer, C.E.O., ceo

For each picklist column, define the canonical list of allowed values and map every variation to the right one. How each system handles a mismatch differs: HubSpot flags it in the wizard and asks you to remap manually; Pipedrive silently drops the value; Salesforce either blanks the field or rejects the row depending on whether the picklist is restricted. In all cases, cleaning before you upload is faster and lower-risk than handling it during the import.

The hardest part is that a single column might have 20 variations of 5 valid values spread across thousands of rows. Find & replace gets you partway there, but you need to catch every variation, including typos you haven’t seen yet.

2. Mixed date formats

Date fields are broken in almost every cross-system export, because there’s no universal standard that everyone uses in practice.

Formats commonly found in the same file:

Format	Example
DD/MM/YYYY	`15/01/2024`
MM/DD/YYYY	`01/15/2024`
YYYY-MM-DD (ISO 8601)	`2024-01-15`
Human-readable	`January 15, 2024` / `Jan 15 2024`
ISO datetime	`2024-01-15T00:00:00Z`
Excel serial number	`45306`

The ambiguous ones are the worst: 01/05/2024 could be January 5th or May 1st depending on who filled in the spreadsheet. When data comes from multiple people or multiple systems, both interpretations can appear in the same column.

Convert everything to YYYY-MM-DD before importing. It’s unambiguous and accepted by virtually every system. In Excel: =TEXT(A1,"YYYY-MM-DD"). In Google Sheets: same formula works for cells it recognises as dates. For cells Excel has stored as text, you’ll need to handle each format pattern separately.

Watch out for Excel serial numbers: if someone saved the CSV from Excel without formatting the date column, you may end up with integers like 45306 instead of dates. These need to be converted back before import.

3. Column naming inconsistencies

When data comes from multiple sources, the same field has a different name in every export. COMPANY_NAME, Company, company_name, Account, Organisation: all meaning the same thing, all requiring a different mapping.

This matters because:

Every system you import into has its own expected field names
Any column the importer can’t match is either skipped or flagged for manual mapping
Manual mapping in import wizards is a one-off, you redo it every time the same file arrives

Standardise column names to match your target system’s schema before uploading. This eliminates the manual mapping step and removes the risk of accidentally skipping a column. Keep a simple map of source column to target field name for each recurring file format.

4. Wrong CSV delimiter

CSV stands for “comma-separated values” but not every CSV uses a comma. Regional settings affect which delimiter Excel chooses when saving:

Windows in most of Europe defaults to semicolons as the delimiter (because the comma is used as the decimal separator)
macOS and US Windows default to commas

When you open a semicolon-delimited file in a system expecting commas, you get one column containing everything, or the data is split wrong. The reverse is also true: importing a comma-delimited file where semicolons are expected causes every row to land in a single column.

Before importing, verify which delimiter the file actually uses. Open it in a text editor (not Excel) and look at the first data row. If you see semicolons between values, the file is semicolon-delimited. Most CRM importers let you specify the delimiter; if yours doesn’t, re-save the file with the correct delimiter.

In Excel: Data → From Text/CSV lets you specify the delimiter on import. On save, “Save As CSV” uses your locale’s default: if that’s semicolons and your CRM expects commas, explicitly change it.

Also check for tab-delimited files (.tsv or .txt files sometimes labeled as CSV). The fix is the same: re-export or convert to comma-delimited before importing.

5. Character encoding issues and BOM

When a CSV opens in Excel on Windows, it’s usually saved as Windows-1252 or Latin-1 encoding. When the destination system expects UTF-8, any character outside basic ASCII (accented letters like é, ü, ñ, em dashes, curly quotes, non-Latin characters) will appear as garbled symbols or be dropped entirely.

This is especially common with:

Names from non-English-speaking countries (Müller, García, Lefèvre)
Company names with special characters
Notes fields where users type freely

Always export and save CSVs as UTF-8. In Excel, use “Save As” → “CSV UTF-8 (Comma delimited)”. In Google Sheets, exports are UTF-8 by default. If you’ve received a file and aren’t sure of the encoding, open it in a text editor and check whether special characters look correct. If they show as Ã© instead of é, the file is Latin-1 and needs re-encoding before import.

Watch for the BOM (Byte Order Mark): Excel’s “CSV UTF-8” format includes a hidden BOM character at the very start of the file. Most systems handle it fine, but some APIs and older importers will show the first column header with a garbled prefix (firstname instead of firstname). If this happens, use a text editor like VS Code or Notepad++ to save the file as “UTF-8 without BOM”.

6. Leading and trailing whitespace

A cell containing "Acme Corp " (with a trailing space) is not the same as "Acme Corp". Most systems treat them as distinct values. This causes:

Picklist validation failures: "Open " doesn’t match "Open"
Deduplication failures: the same company appears twice because one record has a trailing space
Lookup failures in formulas and filters

Whitespace is invisible in most spreadsheet views and easy to miss.

Strip leading and trailing whitespace from every column before importing. In Excel: =TRIM(A1). In Google Sheets: same. Most data processing tools have a built-in trim step, so use it on every text column.

7. Duplicate rows

Duplicates creep in when merging exports from multiple sources, or when the same export is pulled twice. A contact who appears in both a HubSpot export and a Salesforce export will import twice, creating a duplicate record in the destination system.

Deduplication logic varies by system (some merge on email, some on a primary key field) but it’s always safer to deduplicate before importing rather than relying on the destination to handle it.

After merging files, deduplicate on the field that uniquely identifies a record: email for contacts, domain for companies, a deal ID for deals. In Excel: Data → Remove Duplicates. Be intentional about which row to keep when there are conflicts, usually the most recently updated one.

8. Number formatting

Number columns break when locale conventions conflict. European formats use a comma as the decimal separator and a period as the thousands separator (1.234,56). US/international formats do the opposite (1,234.56). When you open a European CSV in a US-locale Excel, 1.234,56 gets treated as text, not a number.

The same issue applies to currency symbols ($1,200 vs 1200), percentages (85% vs 0.85), and phone numbers that get auto-formatted by Excel into something like +3.36E+10.

Strip formatting characters (currency symbols, commas used as thousands separators) and convert to plain numbers before importing. For phone numbers specifically, format them consistently to E.164 international format (+33612345678) or whatever format your target system expects, and treat the column as text to prevent Excel from reformatting it.

9. Empty and null values

Different systems and different people represent “no value” differently: empty string, NULL, N/A, n/a, None, 0, -. Some destination systems treat these differently: 0 in a numeric field is not the same as blank, and N/A in a picklist field will fail validation.

Decide what “empty” means for each column and normalise it. For picklist fields, empty cells are usually safer than N/A or None, the field just won’t be set. For numeric fields, make sure zeros are intentional and not stand-ins for “unknown”.

Quick reference checklist

None of these are hard to fix individually. The difficulty is doing all of them reliably, on every file, without missing anything, especially when the same file format arrives on a recurring schedule and needs to be cleaned the same way each time. For how to turn that recurring cleanup into an actual saved workflow instead of redoing it from memory, see repeatable CSV import workflows for RevOps teams.

If you’re doing this manually for every import, the CSV Normalizer handles the picklist normalisation, date conversion, and column renaming steps automatically, and saves the mapping so recurring files take seconds instead of an hour.

CSV Cleaning: Frequently Asked Questions

What’s the most common CSV import error?

Picklist value mismatches. A column has multiple variations of the same concept (Closed Won, closed won, Won, CLOSED-WON) and only the exact label that matches the destination’s allowed list will import correctly. See how to standardise picklist values before a CRM import for the fix.

How do you check if a CSV file is UTF-8 encoded?

Open the file in Notepad++ or VS Code and look at the encoding label in the bottom-right status bar. “UTF-8” or “UTF-8-BOM” are correct; “ANSI” or “Western (Windows 1252)” need conversion. Full detection methods are in how to fix encoding errors in a CSV file.

What’s the safest date format for CSV imports?

YYYY-MM-DD (ISO 8601). It’s unambiguous, sorts correctly as a string, and is accepted by virtually every database, API, and CRM. HubSpot, Pipedrive, and most modern tools expect this format. See how to standardise date formats in a CSV for the conversion mechanics.

How do you remove invisible whitespace from a CSV column?

In Excel and Google Sheets, =TRIM(A1) strips leading and trailing spaces. Apply to a helper column, paste-special-values over the result, then delete the original. Most CSV cleaning tools have a built-in trim step that runs across all text columns at once.

How do you bulk replace values in a CSV column?

Build a two-column lookup mapping each source variant to its target value, then apply it via VLOOKUP or a CSV converter. See how to bulk replace values in a CSV column for the full approach including how to handle variants you haven’t mapped yet.