How to Clean an Event Attendees List Before Importing It Into Your CRM
Event registration exports are messy by nature — free-text fields, inconsistent company sizes, duplicate attendees, and job titles entered a hundred different ways. Here's how to clean them before they pollute your CRM.
Event registration data is some of the messiest data you’ll ever import into a CRM. Attendees fill in forms on their phones, abbreviate their job titles however they feel like it that day, type their company size as a number or a range or “large”, and sometimes register twice with different email addresses.
Import it as-is and you’ll spend the next three months correcting records manually. Clean it first and you end up with a usable set of leads with accurate segmentation.
The core problem with event registration data
Unlike a CRM export (where data was, hopefully, entered by your team under controlled conditions), event registration data is user-generated. Every field is a free-text box in the attendee’s mind, regardless of what type of input you put in the form.
A dropdown for company size gets "51-200" from one person, "~100" from another, and "medium" from a third. A job title field gets "CTO", "Chief Technology Officer", "co-founder & CTO", "Head of Engineering", and "tech guy" — sometimes all meaning the same seniority level.
The goal before import is to collapse that variation into a finite set of clean, consistent values your CRM can actually filter and segment on.
1. Deduplicate first
Before anything else, remove duplicate registrations. The same person may have registered twice: once with their work email and once with a Gmail, or once early and once after receiving a reminder.
Deduplication strategy:
- Primary key: work email address (exact match)
- Secondary: first name + last name + company (for people who registered with two different emails)
When you find a duplicate, keep the row with the most complete data, not necessarily the most recent one. Someone might have filled in their job title on the first registration but not the second.
After deduplication, look for email variants of the same person: [email protected] and [email protected] could be the same person from the same company. These require a human decision.
2. Standardise company size ranges
Company size is almost always a disaster in event exports. People either misread the range options, ignore them entirely, or the form didn’t validate the input properly.
What you’ll typically find in a single column:
| Raw value | What it means |
|---|---|
50 | probably 1–50, or exactly 50 |
51-200 | standard range |
200+ | ambiguous upper bound |
"medium" | meaningless without context |
"SMB" | depends on your definition |
"enterprise" | same problem |
"1,000-5,000" | valid, but comma is a CSV separator risk |
"<10" | small, but which range? |
"we're a startup" | genuinely not useful |
Define a canonical set of ranges that matches what your CRM uses: for example, what HubSpot’s Number of Employees field expects, or whatever your sales team segments on. Common ones:
1–10 / 11–50 / 51–200 / 201–500 / 501–1000 / 1001–5000 / 5000+
Then map every raw value to the right bucket. Anything that’s genuinely ambiguous ("medium", "SMB") should either be mapped to your closest range or left blank. Don’t guess if the guess will be wrong.
Pay attention to the comma problem: if someone entered "1,000" in a CSV column without quotes, it will split across two columns when the file is parsed. Check your import preview carefully for columns that suddenly have data in the wrong place.
3. Normalise job titles
Job titles suffer from the same free-text problem as company size, but the variation is even wider. The same role gets entered differently by every person who holds it.
Group by seniority and function, and map to a canonical title or a seniority tier:
C-suite / Founder:
CEO, Chief Executive Officer, Co-Founder, Founder & CEO, Managing Director, MD → CEO / Founder
VP level:
VP of Sales, Vice President Sales, VP Sales, Head of Sales, SVP Sales → VP / Head of Sales
Director level:
Director of Marketing, Marketing Director, Dir. Marketing → Director, Marketing
Individual contributor:
Account Executive, AE, Sales Rep, BDR, SDR → keep as-is or normalise to your taxonomy
For CRM segmentation, you usually care more about seniority than exact title. Consider adding a separate Seniority field (C-Suite, VP, Director, Manager, IC) derived from the job title, which is much easier to filter on than 400 title variations.
4. Standardise industry
Industry fields have the same problem as any CRM picklist, amplified by the fact that attendees often type whatever comes to mind: "SaaS", "B2B Software", "Tech", "IT", "Technology" — all potentially mapping to the same category.
Define your industry taxonomy upfront. If you’re importing into HubSpot or Salesforce, use their industry picklist values as the canonical list, since you’ll need to match those on import anyway. Then map every variant in your export to the right one.
"Fintech" → "Financial Services", "HR Tech" → "Human Resources", "Martech" → "Marketing" — your mapping depends on your taxonomy.
5. Fix email addresses
Scan the email column for:
- Missing
@:antoinegmail.com— the@was dropped - Double domains:
[email protected] - Personal emails at work events: if the event was B2B, a
@gmail.comregistration is worth flagging, it might be a competitor, a student, or someone who didn’t want to use their work email - Role-based emails:
[email protected],[email protected]— these won’t go anywhere useful in outreach sequences
You can’t fix these programmatically. Flag them and decide per case.
6. Normalise country and region
If your event had international attendees, country fields will be a mess: "US", "USA", "United States", "United States of America", "U.S.", "us". Pick a standard (ISO 3166-1 alpha-2 codes are the safest for CRM imports: US, GB, FR, DE) and map everything to it.
Same applies to any state or region fields: "CA", "Calif.", "California" should all normalise to "CA" (or "California", whichever your CRM expects).
7. Add missing data before importing
Event exports often give you name, email, and company, but not much else. Before importing, see what you can derive or enrich:
- Company domain: extract from work email (
[email protected]→company.com) — useful for account matching in your CRM - Registration timestamp: most platforms include this; import it as
Event Registration Dateso you know the lead’s age - Event name / source: add a
Lead SourceorCampaigncolumn so every record is tagged to the event — you’ll thank yourself later when filtering
Quick checklist before importing
- Duplicates removed (by email, then by name + company)
- Company size ranges standardised to your canonical set
- Job titles normalised (or seniority tier added as a separate field)
- Industry values mapped to your CRM’s picklist
- Email addresses checked for obvious formatting issues
- Country and region values normalised to a consistent format
- Company domain added (extracted from email where possible)
- Lead source / event name column added to every row
- File saved as UTF-8
The picklist normalisation steps (company size, industry, job title mapping) are the ones that take the longest when done manually. The CSV Normalizer handles these automatically: define your canonical values once, and it maps every variant in the file to the right one. The mapping saves, so the next event’s list from the same platform takes minutes.
Event attendee data: frequently asked questions
How do you deduplicate event registration data?
Use email as the primary key for exact-match deduplication. Then run a secondary pass on first name + last name + company to catch people who registered with both a work and personal email. When duplicates are found, keep the row with the most complete data, not necessarily the most recent one.
How do I standardise job titles before importing leads?
Group titles by seniority and function rather than trying to normalise every variant. Map the dozens of CEO, Co-Founder, Managing Director strings to a single canonical title or a Seniority tier (C-Suite, VP, Director, Manager, IC). Filtering on tier is far more reliable than filtering on free-text titles.
What’s the best way to handle company size variations from event forms?
Define a canonical set of ranges that matches your CRM’s Number of Employees field (typically 1–10, 11–50, 51–200, etc.) and map every raw value to the right bucket. Anything genuinely ambiguous ("medium", "SMB") should map to your closest range or be left blank rather than guessed. See how to bulk replace values in a CSV column for the mapping mechanics.
Should you import personal Gmail addresses from B2B event lists?
Flag them rather than auto-importing. A @gmail.com registration at a B2B event might be a real prospect, a competitor, or a student. These need a per-row decision. Role-based addresses (info@, contact@) should also be flagged: outreach sequences sent to them rarely reach the right person.
How do I segment event attendees by industry when the field is free-text?
Define your industry taxonomy upfront (use HubSpot’s or Salesforce’s industry picklist as the canonical list) and map every variant to it. "SaaS", "B2B Software", "Tech" all map to "Technology" or whichever value your CRM uses. See how to standardise picklist values before a CRM import for the broader pattern.
Asphorem maps your columns, standardises picklist values, and normalises dates so your next import works first time. Free plan included.