Home/Articles/Quotes, delimiters & NULLs in PostgreSQL COPY

The Clean Data Checklist: Quotes, Delimiters, and NULLs in PostgreSQL COPY

A single comma inside a city name, an "N/A" where Postgres expects a number, a file Excel quietly saved as Latin-1 — any one of these will abort a ten-million-row COPY halfway through. The fix is almost always a flag on the COPY command. Here is every flag that matters, when to reach for it, and the bulletproof template that handles 90% of real-world Excel exports.

9 min read·PostgreSQL · CSV Formatting

PostgreSQL's COPY command is fast precisely because it is strict. It does not guess. It does not skip bad rows. If the third field on line 7,431,219 contains a quote character it did not expect, it aborts, rolls back everything, and tells you which line broke — leaving you to figure out why. The strictness is a feature: it means imports either succeed completely or fail without partial corruption.

Most of the apparent "bugs" in COPY are really mismatches between the file's actual format and the assumptions baked into your command. The flags below cover the five places those mismatches happen — delimiters, quoting, NULLs, headers, and encoding — and the staging-table escape hatch covers the rest.

Flag cheat-sheet

FlagDefaultSet it when…
DELIMITER','Your file uses pipes, tabs, or semicolons
QUOTE'"'Your text fields use a non-standard quote character
ESCAPESame as QUOTEYour data contains literal quote characters
NULL''Missing values are written as "N/A", "NULL", or another token
HEADERfalseThe first row of the file is column names, not data
ENCODINGDatabase defaultThe file came from Excel on Windows or another non-UTF-8 source

1. Delimiter: Comma Is Not Always the Right Answer

"CSV" is comma-separated by convention, but commas are surprisingly hostile to real-world text. Addresses contain them. Product descriptions contain them. Anything written by a human eventually contains them. When the delimiter appears inside a field that is not properly quoted, COPY reads it as a column boundary and the row's field count no longer matches the table's column count.

The cleanest fix is upstream: export the file with a delimiter that does not appear in your data. Pipes (|) and tabs are the two safest choices. If you control the export, change the delimiter there and most of your quoting problems vanish at the same time.

COPY my_table FROM 'data.csv'
  WITH (FORMAT CSV, DELIMITER '|');

Tabs require a special syntax — DELIMITER E'\t' using the escape-string prefix — because a literal tab character in the command would be ambiguous. Semicolons are the European default and you will encounter them whenever a file passes through a locale where commas are decimal separators.

2. Quotes and Escapes: Protecting Text That Contains the Delimiter

When you must keep commas as the delimiter, the file should wrap text fields in double quotes. The QUOTE flag tells COPY which character serves that role — usually the default " is correct, but vendor exports occasionally use single quotes or backticks.

The trickier case is when the data itself contains quote characters — a product name like 12" Power Drill or a comment like She said "hello". Postgres needs to know the difference between "this quote ends the field" and "this quote is part of the value." The convention in standard CSV is to double the quote (""), and that is what Postgres expects by default. If your file instead uses backslash-escaping (\"), set the ESCAPE flag explicitly:

COPY my_table FROM 'data.csv'
  WITH (FORMAT CSV, QUOTE '"', ESCAPE '\');

A field can also span multiple lines if it is quoted — a long description that contains a newline becomes a single value, not multiple rows. COPY handles this automatically as long as the quoting is correct. The moment a quote opens and never closes, every subsequent newline gets absorbed into one runaway field and the error message you eventually receive points to a line far past where the real problem lives.

3. NULLs: Telling Postgres What "Missing" Looks Like

Spreadsheet users represent missing data in dozens of ways: blank cells, the literal word NULL, N/A, None, -, sometimes even 0 (which is wrong but common). Postgres has exactly one concept of "missing" — the SQL NULL — and you have to tell it which token in the file maps to that concept.

By default, an unquoted empty string is treated as NULL. A quoted empty string ("") is treated as an empty string, not a NULL — a distinction that matters when your column is NOT NULL with no default. If your file uses a custom token, declare it:

COPY my_table FROM 'data.csv'
  WITH (FORMAT CSV, NULL 'N/A');

Only one NULL token at a time

The NULL flag accepts exactly one string. If your file mixes empty cells and the word "N/A" and "None", you cannot handle all three in a single COPY. Either pre-process the file to normalize them, or use the staging-table strategy at the end of this article and clean them with SQL.

4. Header: Skip the Label Row or Watch the Import Crash

If the first row of your file contains column names like customer_id,name,price, and you do not pass the HEADER flag, COPY tries to insert the literal string price into your NUMERIC column on row one. The error is immediate and obvious — but it is also embarrassingly common, especially when re-running an import script someone else wrote.

COPY my_table FROM 'data.csv'
  WITH (FORMAT CSV, HEADER);

The HEADER flag only skips the row — it does not use the header names for column mapping. The order of columns in the file must still match the order in the table (or the order in your explicit column list). If your CSV columns are in a different order than the table, list them explicitly in the COPY command rather than relying on the header.

5. Encoding: The Invisible Failure Mode

The error message invalid byte sequence for encoding "UTF8" means your file contains bytes that are not valid UTF-8. The file itself is fine — it is just in a different encoding, almost always one of the Windows-Latin variants. Excel on Windows still defaults to WIN1252 (often called Latin-1) for CSV exports unless you explicitly choose "CSV UTF-8."

COPY can transcode on the fly if you tell it what to expect:

COPY my_table FROM 'data.csv'
  WITH (FORMAT CSV, ENCODING 'WIN1252');

The byte sequence is read in the declared encoding, converted to the database's encoding (almost always UTF-8), and stored. The file on disk is never modified. If you are unsure of the source encoding, the file command on Linux/macOS or a text editor like VS Code will report it.

The Bulletproof Template

For a standard Excel-generated CSV, the following combination handles the overwhelming majority of real-world imports. Treat it as your default and adjust only the flags that need to change for a specific file:

COPY target_table
FROM '/path/to/your_file.csv'
WITH (
  FORMAT CSV,
  HEADER,
  DELIMITER ',',
  QUOTE '"',
  NULL '',
  ENCODING 'UTF8'
);

When the File Is Truly Dirty: The TEXT Staging Pattern

Some files cannot be loaded directly no matter how you tune the flags. Mixed NULL tokens, inconsistent date formats, numeric columns sprinkled with text annotations, partial rows from a broken export — the right move is not to keep adjusting COPY flags. It is to load the data into a staging table where every column is TEXT, then clean it with SQL.

Step 1 — Create an all-TEXT staging table

Same column names as the source file, all typed TEXT, no constraints, no indexes. TEXT accepts literally anything that is valid UTF-8.

Step 2 — COPY into staging with permissive flags

No type validation happens because every column is TEXT. The load succeeds even when half the rows are malformed.

Step 3 — Clean with SQL into the final table

Use TRIM, NULLIF, REGEXP_REPLACE, and explicit CAST inside an INSERT … SELECT. You can filter, validate, and transform in a single pass.

Step 4 — Drop the staging table

Once the data is in the typed table, the staging copy is no longer needed. Drop it to reclaim space.

The staging pattern trades two-pass load time for resilience. For one-off imports of messy vendor files, it is almost always worth it. For repeated daily loads where the format is stable, tune the COPY flags once and skip staging. For broader cleaning techniques, see the dedicated data cleansing & scrubbing guide.

The mental model

Every COPY flag answers one question the parser would otherwise have to guess. The five questions are: where does a field end, how is text protected, what counts as missing, is there a label row, and what byte encoding is the file in. Answer all five up front and most import failures simply do not happen.

Flag-tuning, gone

Elvity detects delimiters, quote styles, NULL tokens, and encoding from the file itself — no COPY flag spelunking, no encoding surprises, no half-loaded tables.