In day-to-day Postgres usage, primary keys take care of themselves. You declare a column as SERIAL or GENERATED AS IDENTITY, insert rows without specifying an ID, and the engine fills in monotonically increasing numbers. The bookkeeping happens through a hidden sequence object that ticks forward with every insert.
CSV imports break this contract in a subtle way. The COPY command can either let the sequence assign IDs (like a normal insert) or accept IDs from the file (like a forced override) — and in the second case, the sequence is not updated. The next regular insert then collides with an ID that already exists, and you discover the problem hours or days later when something completely unrelated fails. Choosing the right scenario up front and applying the right fix avoids the trap entirely.
Pick your scenario
| Your CSV | Target table state | Right approach |
|---|---|---|
| Has an id column you want to keep | Empty or non-overlapping IDs | Load IDs as-is, then reset the sequence |
| Has no id column | Anything | Use explicit column list and let the sequence fill IDs |
| Has IDs that overlap existing rows | Populated | Stage + INSERT…ON CONFLICT (upsert) |
| Has IDs but you want fresh ones assigned | Anything | Use explicit column list — ignore the CSV id column |
| Migration from another database | Newly created, empty | Load IDs as-is, reset sequence, verify max(id) |
Scenario 1: CSV Has IDs You Want to Preserve
When migrating data from another system — a legacy database, an export from a vendor, a CSV produced by another team — the existing IDs usually carry meaning. Foreign keys in related tables reference them, audit logs cite them, external systems remember them. Preserving the IDs is a hard requirement, not a preference.
You can COPY the file with the id column included and Postgres will accept the values without complaint. The problem appears only later: the sequence behind the column is never consulted during the import, so its internal counter remains at whatever value it held before the load. The very next normal insert tries to use that stale counter and produces a duplicate-key error.
The classic delayed failure
You load 50,000 rows with IDs 1 through 50,000. The sequence is still at 1. Three days later, an application tries to insert a new row, the sequence returns 2, the unique index rejects it, and you get a 3 AM alert for what looks like an unrelated bug. Always reset the sequence in the same script as the import.
The fix is a single statement, run immediately after the COPY completes, in the same transaction or migration script:
SELECT setval(
pg_get_serial_sequence('your_table', 'id'),
coalesce(max(id), 1)
) FROM your_table;pg_get_serial_sequence resolves the sequence object behind the column without you having to remember its name. setval jumps the counter to the highest ID present, so the next call to nextval returns max+1 — exactly what a normal insert would have produced. The coalesce protects against the edge case where the table is empty (no max), in which case the sequence resets to 1.
Scenario 2: CSV Has No ID Column
For fresh data — new leads, new transactions, new event records — you want Postgres to assign IDs the same way it would for any normal insert. The trick is keeping the id column out of the column list in your COPY command:
\copy target_table (customer_name, email, signup_date) FROM '/path/to/leads.csv' WITH (FORMAT CSV, HEADER);
Because id is not in the parentheses, Postgres falls back to the column's default expression — which for a SERIAL or IDENTITY column means pulling the next value from the sequence. The sequence is updated normally, one tick per row, so subsequent inserts continue without conflict. No reset needed.
This is also the cleanest approach when the CSV happens to contain an id column but you don't care about its values — simply omit it from the column list and the data flows in with freshly generated keys. For the broader rules around the column-list syntax, see the complete PostgreSQL CSV import guide.
Scenario 3: CSV Has IDs That May Overlap Existing Rows (Upsert)
The hardest case: your CSV contains a mix of new records and updates to existing records, keyed by the same ID column. A direct COPY into the target table fails the moment it hits the first existing ID, because the primary-key index rejects the duplicate. COPY does not support conflict logic.
The standard pattern is staging plus INSERT … ON CONFLICT. Load the raw CSV into a constraint-free staging table, then use a single SQL statement to merge it into the target with explicit conflict handling.
The three-step upsert pattern
- 1
Create a staging table
Same column types as the CSV but no primary key, no constraints, no indexes. It exists only to receive the raw data.
- 2
COPY into staging
Standard COPY with the full column list including id. This always succeeds because nothing in staging blocks duplicates.
- 3
INSERT … ON CONFLICT … DO UPDATE
Single statement merges staging into the target. New IDs are inserted; existing IDs trigger an UPDATE on the columns you list in DO UPDATE SET.
INSERT INTO main_table (id, name, price)
SELECT id, name, price FROM temp_staging
ON CONFLICT (id) DO UPDATE
SET name = EXCLUDED.name,
price = EXCLUDED.price;EXCLUDED refers to the row that would have been inserted, giving you access to its values inside the DO UPDATE clause. List only the columns you actually want overwritten — leaving a column out preserves the existing value in the target, which is useful for fields like created_at that should not change on an update.
After the merge completes, run the setval command from Scenario 1 against the target table. Even though the upsert went through normal insert paths, the sequence still needs syncing because some of the IDs came from the CSV rather than from nextval.
SERIAL vs GENERATED AS IDENTITY
SERIAL is the older Postgres-specific shorthand; GENERATED AS IDENTITY is the SQL-standard equivalent introduced in PostgreSQL 10. For new schemas, IDENTITY is the recommended choice — it has cleaner ownership semantics, drops cleanly with the table, and offers a meaningful behavioral switch that matters specifically for imports.
| Declaration | CSV import behavior | When to use it |
|---|---|---|
| SERIAL | COPY accepts file IDs silently; sequence does not advance | Existing schemas; backwards compatibility |
| GENERATED BY DEFAULT AS IDENTITY | COPY accepts file IDs silently; sequence does not advance | New schemas where you sometimes need to load IDs from a file |
| GENERATED ALWAYS AS IDENTITY | COPY rejects file IDs unless you specify OVERRIDING SYSTEM VALUE | Strict schemas where forcing an ID should be a deliberate, visible action |
GENERATED ALWAYS is the safest choice when you want the database to actively prevent accidental ID injection. A normal COPY that includes the id column will fail loudly instead of silently desyncing the sequence — forcing the import script to either omit the column or use OVERRIDING SYSTEM VALUE, which makes the intent explicit and reviewable.
Pre-Import Validation Checklist
ID column data type matches
CSV ID column must contain integers, not "1.0" or "ID-001". A single text value in an INT column aborts the load.
No NULL or empty IDs in the file
COPY will reject NULL into a NOT NULL primary key. Filter or pre-clean before loading.
No duplicate IDs within the CSV itself
Even before hitting the target table, COPY rejects duplicates within the same load if a unique constraint exists.
Sequence advanced past target max
Run setval after every load that includes IDs — make it part of the script, not a manual step.
Foreign keys still resolve
If the CSV references parent IDs from another table, load the parent first or temporarily defer FK checks.
The rule of thumb
If your CSV ever supplies the primary-key value, the import script must reset the sequence before it exits. If your CSV never supplies the primary-key value, use an explicit column list to keep the sequence in charge. Mix the two patterns within one schema and you will eventually fight a duplicate-key bug whose cause is hours away from its symptom.
Adjacent reading
Sequence resets, handled automatically
Elvity detects primary-key columns during ingestion, applies the right COPY strategy for each scenario, and advances sequences after every load — no setval scripts, no 3 AM duplicate-key alerts.