SQL Server's IDENTITY columns and PRIMARY KEY constraints are designed to protect data integrity — but they also create two of the most common CSV import failures. The database wants to manage unique identifiers itself, and bulk loaders often collide with that expectation. Understanding which scenario you're in determines which tool resolves it.
Quick reference
| Situation | Solution |
|---|---|
| CSV has IDs I want to keep | KEEPIDENTITY hint (BULK INSERT) or -E switch (bcp) |
| CSV has no IDs — let SQL Server generate them | Format file or View to skip the IDENTITY column |
| CSV may contain duplicate primary keys | Load into staging table first — no constraints |
| Need to update existing rows and insert new ones | Staging table + MERGE statement |
| GUID primary key, CSV has no GUID column | View or column list that excludes the GUID column |
| Null values in the key column | ERRORFILE to capture and skip invalid rows |
Part 1: IDENTITY Columns
An IDENTITY column auto-generates a sequential integer for every inserted row. When a CSV arrives, there are two fundamentally different situations — and they require opposite solutions.
Scenario A: The CSV has IDs you want to preserve
This is the data migration scenario: you're moving data from another system and need Invoice #105 to remain Invoice #105. By default, SQL Server ignores the ID column in the CSV and generates its own — meaning every ID in your destination table will be wrong. The fix is to explicitly tell SQL Server to accept the external values.
How to enable identity insert — by tool
| Tool | How to enable | Notes |
|---|---|---|
| BULK INSERT (T-SQL) | Add KEEPIDENTITY to the WITH clause | Works in all SQL Server versions that support BULK INSERT |
| bcp utility | Add the -E flag to the command | Must be specified per-command; not a default setting |
| SSMS Import Wizard | Check "Enable identity insert" in the Column Mappings dialog | Easy to miss — the checkbox is unchecked by default |
| SSIS | Set the "Keep identity" property on the OLE DB Destination component | Per-component setting, not global |
Identity insert leaves a gap risk
When you use KEEPIDENTITY to load high IDs from a CSV (say, up to ID 50,000), SQL Server's internal identity counter stays at its last auto-generated value — which may be much lower. The next auto-generated row gets an ID that may conflict with a future import. After a KEEPIDENTITY load, run DBCC CHECKIDENT (TableName, RESEED, MaxID) to synchronize the counter with the actual maximum ID in the table.
Scenario B: The CSV has no IDs — let SQL Server generate them
When the CSV contains only data columns (Name, Date, Amount) and no ID, you'd expect SQL Server to auto-generate the ID and fill the rest from the file. It doesn't work that way by default. BULK INSERT maps CSV columns positionally — column 1 of the CSV goes into column 1 of the table. If column 1 of the table is the IDENTITY column, the first data field ("Name") lands in the ID column and the import fails immediately with a type error.
Two ways to skip the IDENTITY column
- Format file: A format file (
.fmtor XML) maps each CSV column explicitly to a named table column, allowing you to skip the identity column entirely. More setup, but the most precise control. - View: Create a view over the table that exposes only the non-identity columns. Point
BULK INSERTat the view instead of the table. SQL Server fills the identity column automatically. Simpler to maintain for recurring imports.
Part 2: Primary Key Violations
A PRIMARY KEY guarantees uniqueness. If your CSV contains an ID that already exists in the destination table, SQL Server rejects the row — and with BULK INSERT, that rejection doesn't just skip the bad row. It aborts the entire batch and rolls back every row already processed.
Bulk operations are all-or-nothing by default
Import 1,000,000 rows. Row 999,999 is a duplicate primary key. The entire 1,000,000-row operation fails and rolls back — including the 999,998 rows that were valid. The MAXERRORS option lets the import skip up to N rows before aborting, but it doesn't resolve duplicates — it just tolerates a small number of them.
The correct solution for CSV sources that may contain duplicates — or for recurring imports that add both new and updated records — is the Staging Table + MERGE pattern. Rather than importing directly into the production table where constraints will fire, you load into an unconstrained staging table first, then handle duplicates in SQL before they touch production. The staging architecture is covered in depth in the schema evolution guide.
The Staging + MERGE Workflow
Load into staging
Bulk-load the CSV into a staging table with no primary key, no foreign keys, and no constraints. All VARCHAR(MAX) or NVARCHAR(MAX) columns. Nothing can fail here — every row lands regardless.
Deduplicate in staging
If the CSV itself contains duplicate rows for the same key, resolve them now with a T-SQL query — keeping the latest version per key, or applying business logic to determine which row wins.
MERGE into production
The MERGE statement compares staging against the production table on the key column. Matching rows trigger UPDATE (overwrite existing data). Non-matching rows trigger INSERT. The result: new records are added, existing records are refreshed, and nothing aborts due to a key conflict.
Truncate and reuse staging
After a successful merge, truncate the staging table. It's ready for the next import cycle. Because staging has no indexes, TRUNCATE is instantaneous regardless of row count.
MERGE is an upsert — not just a duplicate handler
A CSV import that uses Staging + MERGE automatically handles three cases at once: new rows are inserted, existing rows are updated with the latest values, and rows that exist in production but not in the CSV are left untouched (unless you add a WHEN NOT MATCHED BY SOURCE DELETE clause to retire stale records). This makes MERGE the correct pattern for any recurring import, not just those with known duplicates.
Part 3: Non-Integer Keys — GUIDs and Natural Keys
Not every primary key is a simple integer. Two other key types come up frequently in CSV imports and each has a distinct failure mode.
If the table uses NEWID() as the default for a GUID primary key and the CSV has no GUID column, bulk loading fails the same way as the identity-with-no-ID scenario — column 1 of the CSV lands in the GUID column, which expects a formatted UUID string. The same solutions apply: a format file or a view that hides the GUID column from the bulk loader, letting the NEWID() default fire for each inserted row.
Natural keys use a real business value — email address, product SKU, order number — as the primary key instead of a generated integer. CSV imports can violate these just as easily as integer keys. A particular challenge: natural keys are often case-sensitive in SQL Server depending on the column collation. "user@example.com" and "User@Example.com" may or may not be treated as duplicates depending on the database collation setting. Check the collation before assuming case-insensitive deduplication in your MERGE.
Primary keys cannot be NULL. If the CSV has a missing or empty value in the key column, the row is rejected. Use the ERRORFILE argument in BULK INSERT to capture failing rows to a log file rather than aborting the entire batch. Review the error file to understand whether the nulls are a data quality problem (fix the source) or expected edge cases (handle them in the staging validation step before the MERGE).
These key-handling issues — truncation, type mismatch, collision — are the source of the most confusing error messages in CSV imports. The CSV import troubleshooting guide covers the diagnostic patterns that identify each category of failure quickly, even though it uses PostgreSQL examples — the error taxonomy maps directly across both platforms.
In this series
Handle keys automatically
Elvity detects identity columns, primary keys, and GUID defaults during schema inference — and routes each CSV row into the correct INSERT or UPDATE path without configuration.