Home/Articles/CSV to Postgres via PHP

Programmatic Data Pipelines: CSV to Postgres using PHP

Three approaches — bulk pg_copy_from(), row-by-row prepared statements, and streaming via pg_put_line() — plus the security and memory pitfalls that trip up production pipelines.

8 min read·Data Onboarding Fundamentals

In web development, one of the most frequent requirements is a feature that lets users upload a spreadsheet to populate a database. Manual tools work for admins, but a production application needs an automated, secure, and performant pipeline. This guide covers three PHP approaches in order of speed — and the security rules that apply to all of them.

Metricpg_copy_from()pg_put_line() streamPrepared INSERT loop
SpeedFastestFastModerate
Memory usageHigh — full file in RAMLow — line-by-lineLow — line-by-line
Row validationDifficultDifficultEasy
Best forSystem migrationsMulti-GB filesUser uploads / web forms
Crash safetyWrap in transactionWrap in transactionBuilt-in per BEGIN/COMMIT

Method 1: pg_copy_from() — Maximum Bulk Speed

PHP's PostgreSQL extension provides pg_copy_from(), which wraps the high-speed SQL COPY command. It is the fastest path for loading large CSVs — ideal for migrations and system log imports where validation happens before the upload. For a deep-dive on what makes COPY so much faster than INSERT, see the COPY vs. INSERT comparison.

<?php
$conn = pg_connect("host=localhost dbname=mydb user=myuser password=mypass");

// file() reads the entire CSV into an array of row strings
$rows = file('data.csv');

$result = pg_copy_from($conn, 'target_table', $rows);

if ($result) {
    echo "Bulk import successful.";
} else {
    echo pg_last_error($conn);
}
?>

Memory ceiling

file() loads the entire CSV into a PHP array — a 500 MB file can easily hit PHP's memory_limit. For files over ~100 MB, use the streaming approach (Method 3) instead.

Method 2: Prepared INSERT Loop — Row-Level Validation

When you need to validate each row before it hits the database — checking email format, verifying a product code exists, or skipping malformed records — a prepared statement loop gives you that control. It is slower than COPY but far more flexible.

<?php
$handle = fopen("data.csv", "r");
$headers = fgetcsv($handle); // Consume and discard the header row

// Wrap the loop in a transaction — reduces disk I/O from
// thousands of individual commits to exactly one
pg_query($conn, "BEGIN");

$query = "INSERT INTO users (name, email) VALUES ($1, $2)";
pg_prepare($conn, "import_query", $query);

while (($row = fgetcsv($handle, 1000, ",")) !== FALSE) {
    // Validation logic: skip empty or invalid rows
    if (empty($row[0]) || !filter_var($row[1], FILTER_VALIDATE_EMAIL)) {
        continue;
    }
    pg_execute($conn, "import_query", [$row[0], $row[1]]);
}

pg_query($conn, "COMMIT");
fclose($handle);
?>

Why wrap the loop in BEGIN / COMMIT

Without a transaction, each pg_execute() call auto-commits to disk. For 10,000 rows that means 10,000 fsync operations. Wrapping the whole loop in one transaction reduces that to a single commit — often a 5–10× speed improvement on spinning disk.

Security: CSV Data is User Input

A critical mistake when building upload features is treating a CSV as "safe" because it came from a spreadsheet. CSV files are external input just like a web form — they can carry malicious data if not handled correctly.

Security checklist for CSV upload features

  • Always use prepared statements — never concatenate CSV cell values directly into a query string.
  • Validate MIME type on upload — ensure the file is actually a CSV (text/csv), not a PHP script with a .csv extension.
  • Whitelist delimiter choices — if your UI lets users pick a delimiter, only accept , / ; / | / \t — anything else is rejected.
  • Cap file size server-side — enforce a maximum in both PHP (upload_max_filesize) and your application logic.

Method 3: Streaming with pg_put_line() — Large Files Without Memory Limits

For multi-gigabyte files, loading the whole CSV into a PHP array (as pg_copy_from() does with file()) will exhaust server RAM. The pg_put_line() function lets you pipe data line-by-line directly to Postgres's COPY protocol — without ever holding the full file in memory.

<?php
$handle = fopen("large_data.csv", "r");

// Signal to Postgres that CSV data will follow on STDIN
pg_query($conn, "COPY target_table FROM STDIN WITH (FORMAT csv, HEADER)");

// Stream line by line — memory usage stays flat regardless of file size
while ($line = fgets($handle)) {
    pg_put_line($conn, $line);
}

// Required terminator — tells Postgres the stream is complete
pg_put_line($conn, "\.
");
pg_end_copy($conn);

fclose($handle);
?>

This approach scales to arbitrarily large files. Memory usage stays constant because PHP reads and forwards one line at a time. For the Postgres-side performance tuning that pairs with this (index drops, maintenance_work_mem, WAL tuning), see the bulk loading millions-of-rows guide.

If your PHP pipeline imports data submitted by customers, the data validation problem only grows with scale — different column names, different date formats, different encodings per customer. The CSV import troubleshooting guide covers the most common type mismatches and encoding errors you'll encounter in the wild.

Replace the PHP pipeline with a hosted upload flow

Elvity gives your customers a validated, schema-aware CSV upload experience — no pg_put_line() boilerplate, no MIME-type checks, no memory limit engineering.