SQL Server's IDENTITY columns and PRIMARY KEY constraints are designed to protect data integrity — but they also create two of the most common CSV import failures. The database wants to manage unique identifiers itself, and bulk loaders often collide with that expectation. Understanding which scenario you're in determines which tool resolves it.

Quick reference

Situation	Solution
CSV has IDs I want to keep	KEEPIDENTITY hint (BULK INSERT) or -E switch (bcp)
CSV has no IDs — let SQL Server generate them	Format file or View to skip the IDENTITY column
CSV may contain duplicate primary keys	Load into staging table first — no constraints
Need to update existing rows and insert new ones	Staging table + MERGE statement
GUID primary key, CSV has no GUID column	View or column list that excludes the GUID column
Null values in the key column	ERRORFILE to capture and skip invalid rows

Part 1: IDENTITY Columns

An IDENTITY column auto-generates a sequential integer for every inserted row. When a CSV arrives, there are two fundamentally different situations — and they require opposite solutions.

Scenario A: The CSV has IDs you want to preserve

This is the data migration scenario: you're moving data from another system and need Invoice #105 to remain Invoice #105. By default, SQL Server ignores the ID column in the CSV and generates its own — meaning every ID in your destination table will be wrong. The fix is to explicitly tell SQL Server to accept the external values.

How to enable identity insert — by tool

Tool	How to enable	Notes
BULK INSERT (T-SQL)	Add KEEPIDENTITY to the WITH clause	Works in all SQL Server versions that support BULK INSERT
bcp utility	Add the -E flag to the command	Must be specified per-command; not a default setting
SSMS Import Wizard	Check "Enable identity insert" in the Column Mappings dialog	Easy to miss — the checkbox is unchecked by default
SSIS	Set the "Keep identity" property on the OLE DB Destination component	Per-component setting, not global

Identity insert leaves a gap risk

When you use KEEPIDENTITY to load high IDs from a CSV (say, up to ID 50,000), SQL Server's internal identity counter stays at its last auto-generated value — which may be much lower. The next auto-generated row gets an ID that may conflict with a future import. After a KEEPIDENTITY load, run DBCC CHECKIDENT (TableName, RESEED, MaxID) to synchronize the counter with the actual maximum ID in the table.

Scenario B: The CSV has no IDs — let SQL Server generate them

When the CSV contains only data columns (Name, Date, Amount) and no ID, you'd expect SQL Server to auto-generate the ID and fill the rest from the file. It doesn't work that way by default. BULK INSERT maps CSV columns positionally — column 1 of the CSV goes into column 1 of the table. If column 1 of the table is the IDENTITY column, the first data field ("Name") lands in the ID column and the import fails immediately with a type error.

Two ways to skip the IDENTITY column

Format file: A format file (.fmt or XML) maps each CSV column explicitly to a named table column, allowing you to skip the identity column entirely. More setup, but the most precise control.
View: Create a view over the table that exposes only the non-identity columns. Point BULK INSERT at the view instead of the table. SQL Server fills the identity column automatically. Simpler to maintain for recurring imports.

Part 2: Primary Key Violations

A PRIMARY KEY guarantees uniqueness. If your CSV contains an ID that already exists in the destination table, SQL Server rejects the row — and with BULK INSERT, that rejection doesn't just skip the bad row. It aborts the entire batch and rolls back every row already processed.

Bulk operations are all-or-nothing by default

Import 1,000,000 rows. Row 999,999 is a duplicate primary key. The entire 1,000,000-row operation fails and rolls back — including the 999,998 rows that were valid. The MAXERRORS option lets the import skip up to N rows before aborting, but it doesn't resolve duplicates — it just tolerates a small number of them.

The correct solution for CSV sources that may contain duplicates — or for recurring imports that add both new and updated records — is the Staging Table + MERGE pattern. Rather than importing directly into the production table where constraints will fire, you load into an unconstrained staging table first, then handle duplicates in SQL before they touch production. The staging architecture is covered in depth in the schema evolution guide.

The Staging + MERGE Workflow

Load into staging

Bulk-load the CSV into a staging table with no primary key, no foreign keys, and no constraints. All VARCHAR(MAX) or NVARCHAR(MAX) columns. Nothing can fail here — every row lands regardless.

Deduplicate in staging

If the CSV itself contains duplicate rows for the same key, resolve them now with a T-SQL query — keeping the latest version per key, or applying business logic to determine which row wins.

MERGE into production

The MERGE statement compares staging against the production table on the key column. Matching rows trigger UPDATE (overwrite existing data). Non-matching rows trigger INSERT. The result: new records are added, existing records are refreshed, and nothing aborts due to a key conflict.

Truncate and reuse staging

After a successful merge, truncate the staging table. It's ready for the next import cycle. Because staging has no indexes, TRUNCATE is instantaneous regardless of row count.

MERGE is an upsert — not just a duplicate handler

A CSV import that uses Staging + MERGE automatically handles three cases at once: new rows are inserted, existing rows are updated with the latest values, and rows that exist in production but not in the CSV are left untouched (unless you add a WHEN NOT MATCHED BY SOURCE DELETE clause to retire stale records). This makes MERGE the correct pattern for any recurring import, not just those with known duplicates.

Part 3: Non-Integer Keys — GUIDs and Natural Keys

Not every primary key is a simple integer. Two other key types come up frequently in CSV imports and each has a distinct failure mode.

GUID (UniqueIdentifier) primary keys

If the table uses NEWID() as the default for a GUID primary key and the CSV has no GUID column, bulk loading fails the same way as the identity-with-no-ID scenario — column 1 of the CSV lands in the GUID column, which expects a formatted UUID string. The same solutions apply: a format file or a view that hides the GUID column from the bulk loader, letting the NEWID() default fire for each inserted row.

Natural keys (email, SKU, username)

Natural keys use a real business value — email address, product SKU, order number — as the primary key instead of a generated integer. CSV imports can violate these just as easily as integer keys. A particular challenge: natural keys are often case-sensitive in SQL Server depending on the column collation. "user@example.com" and "User@Example.com" may or may not be treated as duplicates depending on the database collation setting. Check the collation before assuming case-insensitive deduplication in your MERGE.

Null values in any key column

Primary keys cannot be NULL. If the CSV has a missing or empty value in the key column, the row is rejected. Use the ERRORFILE argument in BULK INSERT to capture failing rows to a log file rather than aborting the entire batch. Review the error file to understand whether the nulls are a data quality problem (fix the source) or expected edge cases (handle them in the staging validation step before the MERGE).

These key-handling issues — truncation, type mismatch, collision — are the source of the most confusing error messages in CSV imports. The CSV import troubleshooting guide covers the diagnostic patterns that identify each category of failure quickly, even though it uses PostgreSQL examples — the error taxonomy maps directly across both platforms.

Handling Identity Columns and Primary Keys During SQL Server CSV Imports