Home/Articles/Bulk Loading CSV to Postgres

Bulk Loading: How to Import Millions of Rows from CSV to Postgres Fast

The "Drop-Load-Rebuild" pattern, memory tuning, WAL management, and the FREEZE option — everything you need to shrink a multi-hour bulk import down to minutes.

9 min read·Data Onboarding Fundamentals

When you are dealing with datasets in the hundreds of thousands or millions, a standard import process can turn into a bottleneck that lasts hours. To achieve a high-performance bulk import CSV to Postgres, you need to move beyond simple commands and look at how the database handles internal resources.

This guide explores the professional strategy for Postgres bulk load CSV operations, focusing on bypassing overhead, managing memory, and optimizing the Write-Ahead Log (WAL). For a foundation on the core import commands, see our Ultimate Guide to Importing CSV Files into PostgreSQL and the COPY vs. INSERT comparison.

OptimizationImpactWhen to Apply
Drop indexes before load2–5× faster insertsAlways for > 100k rows
Disable triggersEliminates per-row logic overheadWhen triggers exist on target table
Raise maintenance_work_memFaster index rebuildsAlways during bulk ops
UNLOGGED table stagingUp to 2× faster loadInitial migrations with backup
COPY with FREEZESkips future Vacuum scanFresh table or post-TRUNCATE
VACUUM ANALYZE after loadCorrect query planner statsAlways — run immediately after

1. The Strategy: Drop, Load, Rebuild

The biggest performance killers during a Postgres bulk insert CSV are not the data transfers themselves, but the auxiliary work the database does for every row: updating indexes, checking foreign key constraints, and firing triggers.

For millions of rows, the most efficient workflow is:

  1. Drop or disable all indexes and triggers.
  2. Load the data using the COPY command.
  3. Rebuild the indexes and re-enable triggers in one batch.

Disabling Triggers and Constraints

Triggers run logic for every single row, which adds massive overhead at scale. Use this command to silence them during the load:

ALTER TABLE target_table DISABLE TRIGGER ALL;

Remember to re-enable them once the import is verified:

ALTER TABLE target_table ENABLE TRIGGER ALL;

Dropping Indexes

Rebuilding an index from scratch after a full load is significantly faster than updating it row-by-row 10 million times. This is the single highest-impact optimization for large imports.

-- Before the load
DROP INDEX idx_user_email;

-- Perform the COPY load (see Section 4) --

-- After the load
CREATE INDEX idx_user_email ON target_table(email);

2. Tuning Memory for Bulk Loads

PostgreSQL has specific memory settings that dictate how fast it can process large chunks of data and rebuild indexes. These are session-level settings you can change without restarting the server.

maintenance_work_mem

This is the most critical setting for loading data into Postgres from CSV. It determines the maximum amount of memory used for maintenance operations like CREATE INDEX.

Recommended setting

Default is usually 64MB — increase to 1GB or 25% of your total RAM for the duration of the import. This alone can cut index rebuild time by 50% or more.

SET maintenance_work_mem = '1GB';

max_wal_size

During a massive load CSV into Postgres, the Write-Ahead Log (WAL) fills up quickly. If it reaches its limit, Postgres triggers a "checkpoint," which flushes data to disk and momentarily stalls the import. Increasing max_wal_size to 4GB or higher allows the database to stay in "fast load" mode longer between checkpoints.

-- In postgresql.conf or as a superuser session setting
SET max_wal_size = '4GB';

checkpoint_timeout

Even with a generous max_wal_size, Postgres still forces a checkpoint every five minutes by default. On a multi-hour bulk load that produces dozens of extra checkpoints — each one a burst of disk I/O that competes with the import for bandwidth. Stretch the interval for the duration of the migration:

-- 30 minutes between time-based checkpoints
SET checkpoint_timeout = '30min';

Pair this with the larger max_wal_size so checkpoints are driven by neither time nor WAL volume during the load. Restore both to their normal values once the import finishes — a 30-minute checkpoint interval is wrong for transactional traffic, where it would lengthen crash-recovery time after a restart.

synchronous_commit

For every committing transaction Postgres waits for the WAL record to be physically flushed to disk before reporting success. That round-trip is essential for banking-style workloads but pure overhead during a bulk load — the only thing waiting on the commit acknowledgment is your import script. Turning it off for the session lets Postgres acknowledge commits as soon as the WAL record is in memory:

SET synchronous_commit = off;
COPY my_table FROM '/path/to/large_file.csv' WITH (FORMAT CSV, HEADER);

The risk is narrowly scoped: if Postgres crashes within the in-flight window (usually under a second), the most recent committed transactions can be lost. The on-disk database itself remains consistent — this is not the same as turning off fsync, which would risk full corruption. For a CSV import where you can simply re-run on failure, the tradeoff is almost always worth it.

3. Utilizing "Unlogged" Tables

If you are performing an initial data migration where the source data is safely backed up elsewhere, consider using an UNLOGGED staging table.

Unlogged tables do not write to the WAL. This makes a Postgres load CSV operation nearly twice as fast. The pattern pairs naturally with the staging table upsert workflow — bulk load into an unlogged table, then merge into your production table using INSERT ... ON CONFLICT.

You do not have to create a new table to benefit. An existing empty (or about-to-be-truncated) table can be flipped to unlogged for the duration of the load and flipped back afterward:

ALTER TABLE target_table SET UNLOGGED;
COPY target_table FROM '/path/to/data.csv' WITH (FORMAT CSV, HEADER);
ALTER TABLE target_table SET LOGGED;

The final SET LOGGED rewrites the table and writes the full contents to the WAL in one pass — much cheaper than logging every row individually during the load. The window of vulnerability is bounded by the duration of the load itself.

CREATE UNLOGGED TABLE staging_data (
    id    INT,
    name  TEXT,
    email TEXT
);

-- Import your data at full speed --
COPY staging_data FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER);

-- Convert to a crash-safe logged table after the import --
ALTER TABLE staging_data SET LOGGED;

Crash safety warning

Unlogged tables are not crash-safe. If the server loses power during the load, the table's data will be lost. Only use this approach when the source CSV is safely stored elsewhere and can be re-loaded if needed.

4. The High-Speed COPY Command with FREEZE

The standard INSERT statement is far too slow for millions of rows. You must use the COPY command — for the full command reference including remote and AWS RDS usage, see the command line import guide.

For maximum speed, add the FREEZE option. This marks rows as "already old," preventing the need for a background Vacuum process to scan them later. It only works if the table is freshly created or truncated in the same transaction:

BEGIN;
TRUNCATE target_table;
COPY target_table
  FROM '/path/to/data.csv'
  WITH (FORMAT csv, FREEZE, HEADER);
COMMIT;

5. Post-Load Cleanup: The ANALYZE Phase

Once the Postgres bulk load is finished and your indexes are rebuilt, the database doesn't yet "know" how much data was added. The query planner might still think the table is empty, leading to poor query plans on your first searches.

Always run a manual vacuum and analyze immediately after loading:

VACUUM ANALYZE target_table;

This updates the internal statistics so the query planner can optimize future searches correctly. Before your data even reaches this stage, ensure it's clean — data validation before a bulk import prevents bad rows from corrupting an otherwise-clean load.

Summary Checklist for Maximum Speed

  1. Memory: Set maintenance_work_mem to at least 1GB.
  2. Constraints: Drop indexes and disable triggers before starting.
  3. Staging: Use an UNLOGGED table (or ALTER TABLE … SET UNLOGGED on the target) if source data is backed up.
  4. WAL tuning: Raise max_wal_size to 4GB+ and stretch checkpoint_timeout to 30 minutes for the duration of the load.
  5. Commit overhead: SET synchronous_commit = off for the import session.
  6. Command: Use COPY with the FREEZE and HEADER options inside a transaction.
  7. Restore safety: ALTER TABLE … SET LOGGED, restore synchronous_commit, max_wal_size, and checkpoint_timeout before resuming normal traffic.
  8. Statistics: Run VACUUM ANALYZE as soon as the load completes.

By following this "Drop-Load-Rebuild" pattern, you can reduce the time to load millions of rows into Postgres from hours to mere minutes.

Further Reading

Skip the tuning — let Elvity handle it automatically

Elvity's automated onboarding engine handles schema inference, validation, and high-throughput ingestion without manual index drops, memory tuning, or custom scripts.