How to Clean an Event Attendees List Before Importing It Into Your CRM

Event registration data is some of the messiest data you’ll ever import into a CRM. Attendees fill in forms on their phones, abbreviate their job titles however they feel like it that day, type their company size as a number or a range or “large”, and sometimes register twice with different email addresses.

Import it as-is and you’ll spend the next three months correcting records manually. Clean it first and you end up with a usable set of leads with accurate segmentation.

The core problem with event registration data

Unlike a CRM export (where data was, hopefully, entered by your team under controlled conditions), event registration data is user-generated. Every field is a free-text box in the attendee’s mind, regardless of what type of input you put in the form.

A dropdown for company size gets "51-200" from one person, "~100" from another, and "medium" from a third. A job title field gets "CTO", "Chief Technology Officer", "co-founder & CTO", "Head of Engineering", and "tech guy", sometimes all meaning the same seniority level.

The goal before import is to collapse that variation into a finite set of clean, consistent values your CRM can actually filter and segment on.

1. Deduplicate first

Before anything else, remove duplicate registrations. The same person may have registered twice: once with their work email and once with a Gmail, or once early and once after receiving a reminder.

Deduplication strategy:

Primary key: work email address (exact match)
Secondary: first name + last name + company (for people who registered with two different emails)

When you find a duplicate, keep the row with the most complete data, not necessarily the most recent one. Someone might have filled in their job title on the first registration but not the second.

After deduplication, look for email variants of the same person: [email protected] and [email protected] could be the same person from the same company. These require a human decision.

2. Separate attendees from no-shows

Not everyone who registered showed up, and these two groups should not receive the same CRM treatment. A no-show who registered for a product webinar is still a warm lead, but sending them a “great to meet you at the event” follow-up sequence is wrong and damages trust.

Most platforms (Eventbrite, Hopin, Zoom Webinars, Goldcast, ON24) include an attendance status column in their exports. Before importing, check whether your file has it and make sure every row is tagged:

Attended: prioritise for direct outreach; can reference the event in messaging
Registered / No-show: different nurture track. They had intent (they registered) but didn’t show; re-engagement works better than an event follow-up
Walk-in / On-site: attended without pre-registering, often have less data filled in

Add an Event Attendance Status column to your CSV with the value for each row before importing. This gives you a CRM segment you can actually act on.

If the platform doesn’t include attendance status, check whether session join timestamps are in the export (Zoom Webinar reports include join time per registrant). A registrant with no join time is a no-show.

3. Standardise company size ranges

Company size is almost always a disaster in event exports. People either misread the range options, ignore them entirely, or the form didn’t validate the input properly.

What you’ll typically find in a single column:

Raw value	What it means
`50`	probably 1–50, or exactly 50
`51-200`	standard range
`200+`	ambiguous upper bound
`"medium"`	meaningless without context
`"SMB"`	depends on your definition
`"enterprise"`	same problem
`"1,000-5,000"`	valid, but comma is a CSV separator risk
`"<10"`	small, but which range?
`"we're a startup"`	genuinely not useful

Define a canonical set of ranges that matches what your CRM uses: for example, what HubSpot’s Number of Employees field expects, or whatever your sales team segments on. Common ones:

1–10 / 11–50 / 51–200 / 201–500 / 501–1000 / 1001–5000 / 5000+

Then map every raw value to the right bucket. Anything that’s genuinely ambiguous ("medium", "SMB") should either be mapped to your closest range or left blank. Don’t guess if the guess will be wrong.

Pay attention to the comma problem: if someone entered "1,000" in a CSV column without quotes, it will split across two columns when the file is parsed. Check your import preview carefully for columns that suddenly have data in the wrong place.

4. Normalise job titles

Job titles suffer from the same free-text problem as company size, but the variation is even wider. The same role gets entered differently by every person who holds it.

Group by seniority and function, and map to a canonical title or a seniority tier:

C-suite / Founder: CEO, Chief Executive Officer, Co-Founder, Founder & CEO, Managing Director, MD → CEO / Founder

VP level: VP of Sales, Vice President Sales, VP Sales, Head of Sales, SVP Sales → VP / Head of Sales

Director level: Director of Marketing, Marketing Director, Dir. Marketing → Director, Marketing

Individual contributor: Account Executive, AE, Sales Rep, BDR, SDR → keep as-is or normalise to your taxonomy

LinkedIn-sourced titles: If your event used LinkedIn sign-in (LinkedIn Events, or Eventbrite / Hopin with LinkedIn auth), expect verbose strings like "Building the future of HR | Forbes 30 Under 30" or "Co-Founder @ Acme | Previously VP Sales at BigCo". No formula will reliably extract a clean role from these; flag them for manual review before importing.

For CRM segmentation, you usually care more about seniority than exact title. Consider adding a separate Seniority field (C-Suite, VP, Director, Manager, IC) derived from the job title, which is much easier to filter on than 400 title variations.

5. Standardise industry

Industry fields have the same problem as any CRM picklist, amplified by the fact that attendees often type whatever comes to mind: "SaaS", "B2B Software", "Tech", "IT", "Technology", all potentially mapping to the same category.

Define your industry taxonomy upfront. If you’re importing into HubSpot or Salesforce, use their industry picklist values as the canonical list, since you’ll need to match those on import anyway. Then map every variant in your export to the right one.

"Fintech" → "Financial Services", "HR Tech" → "Human Resources", "Martech" → "Marketing": your mapping depends on your taxonomy.

6. Fix email addresses

Scan the email column for:

Missing @: johngmail.com, the @ was dropped
Double domains: [email protected]
Personal emails at work events: if the event was B2B, a @gmail.com registration is worth flagging, it might be a competitor, a student, or someone who didn’t want to use their work email
Role-based emails: [email protected], [email protected]. These won’t go anywhere useful in outreach sequences

You can’t fix these programmatically. Flag them and decide per case.

7. Normalise country and region

If your event had international attendees, country fields will be a mess: "US", "USA", "United States", "United States of America", "U.S.", "us". Pick a standard (ISO 3166-1 alpha-2 codes are the safest for CRM imports: US, GB, FR, DE) and map everything to it.

Same applies to any state or region fields: "CA", "Calif.", "California" should all normalise to "CA" (or "California", whichever your CRM expects).

8. Add missing data before importing

Event exports often give you name, email, and company, but not much else. Before importing, see what you can derive or enrich:

Full name split: many platforms export a single Full Name column, but most CRMs expect separate First Name and Last Name fields. In Excel or Google Sheets: =LEFT(A2, FIND(" ", A2)-1) extracts the first name, =MID(A2, FIND(" ", A2)+1, LEN(A2)) gets the remainder. Review edge cases: hyphenated names, names with prefixes, or single-word entries will trip up the formula. See how to split a column in a CSV before importing to your CRM for the fuller set of edge cases and a non-formula approach.
Company domain: extract from work email ([email protected] → company.com), useful for account matching in your CRM
Registration timestamp: most platforms include this; import it as Event Registration Date so you know the lead’s age
Event name / source: add a Lead Source or Campaign column so every record is tagged to the event, you’ll thank yourself later when filtering. If your event promotion used UTM-tagged links, keep those values consistent too; see how to standardise UTM parameters across your marketing team.
Opt-in / marketing consent: check whether your registration form captured explicit marketing consent. If it did, include that in your import (e.g. a Marketing Opt-in property set to true/false). If your CRM uses a subscription status or GDPR consent model (like HubSpot’s legal basis fields), set it correctly. Importing a contact without consent configured can inadvertently enrol them in sequences they never agreed to receive.

Quick checklist before importing

The picklist normalisation steps (company size, industry, job title mapping) are the ones that take the longest when done manually. The CSV Normalizer handles these automatically: define your canonical values once, and it maps every variant in the file to the right one. The mapping saves, so the next event’s list from the same platform takes minutes.

Event Attendee Data: Frequently Asked Questions

How do you handle no-shows in an event attendee import?

Add an Event Attendance Status column to your CSV before importing and tag each row as attended, no-show, or walk-in. Import the whole list but use the status field to route contacts into the right CRM sequences: no-shows who pre-registered have intent and shouldn’t receive “great to meet you” messaging. If your platform doesn’t export this directly, check for session join timestamps in the report (Zoom Webinar reports include join time per registrant; a missing join time means no-show).

How do you deduplicate event registration data?

Use email as the primary key for exact-match deduplication. Then run a secondary pass on first name + last name + company to catch people who registered with both a work and personal email. When duplicates are found, keep the row with the most complete data, not necessarily the most recent one.

How do I standardise job titles before importing leads?

Group titles by seniority and function rather than trying to normalise every variant. Map the dozens of CEO, Co-Founder, Managing Director strings to a single canonical title or a Seniority tier (C-Suite, VP, Director, Manager, IC). Filtering on tier is far more reliable than filtering on free-text titles.

What’s the best way to handle company size variations from event forms?

Define a canonical set of ranges that matches your CRM’s Number of Employees field (typically 1–10, 11–50, 51–200, etc.) and map every raw value to the right bucket. Anything genuinely ambiguous ("medium", "SMB") should map to your closest range or be left blank rather than guessed. See how to bulk replace values in a CSV column for the mapping mechanics.

Should you import personal Gmail addresses from B2B event lists?

Flag them rather than auto-importing. A @gmail.com registration at a B2B event might be a real prospect, a competitor, or a student. These need a per-row decision. Role-based addresses (info@, contact@) should also be flagged: outreach sequences sent to them rarely reach the right person.

How do I segment event attendees by industry when the field is free-text?

Define your industry taxonomy upfront (use HubSpot’s or Salesforce’s industry picklist as the canonical list) and map every variant to it. "SaaS", "B2B Software", "Tech" all map to "Technology" or whichever value your CRM uses. See how to standardise picklist values before a CRM import for the broader pattern.