Home/Articles/SERIAL columns & primary keys in Postgres CSV imports

Managing SERIAL Columns and Primary Keys in PostgreSQL CSV Imports

SERIAL and GENERATED AS IDENTITY auto-fill primary keys for normal inserts — but CSV imports bypass that machinery and silently desync the sequence. Here is how to handle every common case without ending up with duplicate-key errors a week later.

8 min read·PostgreSQL · Data Integrity

In day-to-day Postgres usage, primary keys take care of themselves. You declare a column as SERIAL or GENERATED AS IDENTITY, insert rows without specifying an ID, and the engine fills in monotonically increasing numbers. The bookkeeping happens through a hidden sequence object that ticks forward with every insert.

CSV imports break this contract in a subtle way. The COPY command can either let the sequence assign IDs (like a normal insert) or accept IDs from the file (like a forced override) — and in the second case, the sequence is not updated. The next regular insert then collides with an ID that already exists, and you discover the problem hours or days later when something completely unrelated fails. Choosing the right scenario up front and applying the right fix avoids the trap entirely.

Pick your scenario

Your CSVTarget table stateRight approach
Has an id column you want to keepEmpty or non-overlapping IDsLoad IDs as-is, then reset the sequence
Has no id columnAnythingUse explicit column list and let the sequence fill IDs
Has IDs that overlap existing rowsPopulatedStage + INSERT…ON CONFLICT (upsert)
Has IDs but you want fresh ones assignedAnythingUse explicit column list — ignore the CSV id column
Migration from another databaseNewly created, emptyLoad IDs as-is, reset sequence, verify max(id)

Scenario 1: CSV Has IDs You Want to Preserve

When migrating data from another system — a legacy database, an export from a vendor, a CSV produced by another team — the existing IDs usually carry meaning. Foreign keys in related tables reference them, audit logs cite them, external systems remember them. Preserving the IDs is a hard requirement, not a preference.

You can COPY the file with the id column included and Postgres will accept the values without complaint. The problem appears only later: the sequence behind the column is never consulted during the import, so its internal counter remains at whatever value it held before the load. The very next normal insert tries to use that stale counter and produces a duplicate-key error.

The classic delayed failure

You load 50,000 rows with IDs 1 through 50,000. The sequence is still at 1. Three days later, an application tries to insert a new row, the sequence returns 2, the unique index rejects it, and you get a 3 AM alert for what looks like an unrelated bug. Always reset the sequence in the same script as the import.

The fix is a single statement, run immediately after the COPY completes, in the same transaction or migration script:

SELECT setval(
  pg_get_serial_sequence('your_table', 'id'),
  coalesce(max(id), 1)
) FROM your_table;

pg_get_serial_sequence resolves the sequence object behind the column without you having to remember its name. setval jumps the counter to the highest ID present, so the next call to nextval returns max+1 — exactly what a normal insert would have produced. The coalesce protects against the edge case where the table is empty (no max), in which case the sequence resets to 1.

Scenario 2: CSV Has No ID Column

For fresh data — new leads, new transactions, new event records — you want Postgres to assign IDs the same way it would for any normal insert. The trick is keeping the id column out of the column list in your COPY command:

\copy target_table (customer_name, email, signup_date)
  FROM '/path/to/leads.csv'
  WITH (FORMAT CSV, HEADER);

Because id is not in the parentheses, Postgres falls back to the column's default expression — which for a SERIAL or IDENTITY column means pulling the next value from the sequence. The sequence is updated normally, one tick per row, so subsequent inserts continue without conflict. No reset needed.

This is also the cleanest approach when the CSV happens to contain an id column but you don't care about its values — simply omit it from the column list and the data flows in with freshly generated keys. For the broader rules around the column-list syntax, see the complete PostgreSQL CSV import guide.

Scenario 3: CSV Has IDs That May Overlap Existing Rows (Upsert)

The hardest case: your CSV contains a mix of new records and updates to existing records, keyed by the same ID column. A direct COPY into the target table fails the moment it hits the first existing ID, because the primary-key index rejects the duplicate. COPY does not support conflict logic.

The standard pattern is staging plus INSERT … ON CONFLICT. Load the raw CSV into a constraint-free staging table, then use a single SQL statement to merge it into the target with explicit conflict handling.

The three-step upsert pattern

  1. 1

    Create a staging table

    Same column types as the CSV but no primary key, no constraints, no indexes. It exists only to receive the raw data.

  2. 2

    COPY into staging

    Standard COPY with the full column list including id. This always succeeds because nothing in staging blocks duplicates.

  3. 3

    INSERT … ON CONFLICT … DO UPDATE

    Single statement merges staging into the target. New IDs are inserted; existing IDs trigger an UPDATE on the columns you list in DO UPDATE SET.

INSERT INTO main_table (id, name, price)
SELECT id, name, price FROM temp_staging
ON CONFLICT (id) DO UPDATE
  SET name  = EXCLUDED.name,
      price = EXCLUDED.price;

EXCLUDED refers to the row that would have been inserted, giving you access to its values inside the DO UPDATE clause. List only the columns you actually want overwritten — leaving a column out preserves the existing value in the target, which is useful for fields like created_at that should not change on an update.

After the merge completes, run the setval command from Scenario 1 against the target table. Even though the upsert went through normal insert paths, the sequence still needs syncing because some of the IDs came from the CSV rather than from nextval.

SERIAL vs GENERATED AS IDENTITY

SERIAL is the older Postgres-specific shorthand; GENERATED AS IDENTITY is the SQL-standard equivalent introduced in PostgreSQL 10. For new schemas, IDENTITY is the recommended choice — it has cleaner ownership semantics, drops cleanly with the table, and offers a meaningful behavioral switch that matters specifically for imports.

DeclarationCSV import behaviorWhen to use it
SERIALCOPY accepts file IDs silently; sequence does not advanceExisting schemas; backwards compatibility
GENERATED BY DEFAULT AS IDENTITYCOPY accepts file IDs silently; sequence does not advanceNew schemas where you sometimes need to load IDs from a file
GENERATED ALWAYS AS IDENTITYCOPY rejects file IDs unless you specify OVERRIDING SYSTEM VALUEStrict schemas where forcing an ID should be a deliberate, visible action

GENERATED ALWAYS is the safest choice when you want the database to actively prevent accidental ID injection. A normal COPY that includes the id column will fail loudly instead of silently desyncing the sequence — forcing the import script to either omit the column or use OVERRIDING SYSTEM VALUE, which makes the intent explicit and reviewable.

Pre-Import Validation Checklist

ID column data type matches

CSV ID column must contain integers, not "1.0" or "ID-001". A single text value in an INT column aborts the load.

No NULL or empty IDs in the file

COPY will reject NULL into a NOT NULL primary key. Filter or pre-clean before loading.

No duplicate IDs within the CSV itself

Even before hitting the target table, COPY rejects duplicates within the same load if a unique constraint exists.

Sequence advanced past target max

Run setval after every load that includes IDs — make it part of the script, not a manual step.

Foreign keys still resolve

If the CSV references parent IDs from another table, load the parent first or temporarily defer FK checks.

The rule of thumb

If your CSV ever supplies the primary-key value, the import script must reset the sequence before it exits. If your CSV never supplies the primary-key value, use an explicit column list to keep the sequence in charge. Mix the two patterns within one schema and you will eventually fight a duplicate-key bug whose cause is hours away from its symptom.

Sequence resets, handled automatically

Elvity detects primary-key columns during ingestion, applies the right COPY strategy for each scenario, and advances sequences after every load — no setval scripts, no 3 AM duplicate-key alerts.