Home/Articles/Delimiters, Quotes, and Newlines in SQL Server CSV Imports

Messy Data: Dealing with Delimiters, Quotes, and Newlines in SQL Server CSV Imports

Commas inside addresses, double-quotes in product names, line breaks in notes fields — these are the three formatting issues that cause most CSV import failures. Here is exactly how each one manifests and how to fix it.

9 min read·SQL Server · CSV Data Quality

A CSV file is a plain text format with no schema enforcement. The rules — which character separates columns, which character wraps text that contains the separator, which character ends a row — are all conventions that the producer and consumer of the file must agree on in advance. When the file you receive was created with different conventions than what your import tool expects, the result is data shifted into the wrong columns, rows split across multiple records, or an outright import failure.

The good news is that each failure mode has a distinct symptom in SQL Server, which points to a specific cause, which has a specific fix. The table below maps symptoms to causes before the detailed sections that follow.

Symptom → cause → fix quick reference

What you see in SQL ServerCauseFix
Data shifted: "Inc." appears in the City column instead of the Company columnUnquoted comma inside a field value treated as a column separatorUse a text qualifier (double-quote) in the import tool, or switch to a pipe/tab delimiter
Import fails: "Bulk load data conversion error" on a row with a comma in a fieldNo text qualifier configured — BULK INSERT sees comma as delimiter and produces a column count mismatchAdd FORMAT = 'CSV' (SQL Server 2017+) or switch delimiter to pipe/tab
Import fails: "Unexpected end of file" or one row has hundreds of columnsStray unmatched double-quote in a field is trapping everything after it as a single fieldFind and escape or remove the stray quote; use doubled double-quotes for literal quotes inside quoted fields
Row count in SQL Server is higher than expected, some rows have only 1–2 columns of dataEmbedded newline in a field value treated as a row terminator, splitting one record into multiple rowsPre-process file to remove/replace embedded newlines before import
Garbled text: "é" appears as "é" or "é"File saved as UTF-8 but SQL Server read it as Windows-1252 (or vice versa)Match the DATAFILETYPE option to the file's actual encoding; save CSV as UTF-8 with BOM if using older tools
Import succeeds but all column data is empty or NULLRow terminator mismatch — file uses \n but tool expects \r\n (or vice versa)Specify ROWTERMINATOR explicitly using hex values: 0x0a (LF) or 0x0d0a (CRLF)

Problem 1: Delimiters Inside Field Values

A CSV file that uses commas as delimiters is unambiguous only when no field value contains a comma. The moment a company name ("Widget Corp, Inc."), an address ("123 Main St, Suite 4"), or a note contains a comma, the parser treats it as a column boundary. Every column after the embedded comma shifts one position to the right — the rest of that row lands in the wrong columns, and SQL Server either produces a column-count mismatch error or silently imports corrupted data.

There are two solutions, and they apply at different points in the workflow:

Delimiter options — tradeoffs and when to use each

DelimiterCommon inRisk of appearing in dataBULK INSERT FIELDTERMINATOR
Comma (,)Excel "CSV", most exportsHigh — addresses, numbers, lists','
Semicolon (;)European locale CSV exports (where comma is the decimal separator)Medium — formulae, programming code';'
Tab (\t)TSV exports, Access, many ETL toolsLow — almost never in text data'\t'
Pipe (|)EDI, data warehouse exports, mainframe systemsVery low — rarely typed by users'|'
Tilde (~)Legacy systems, custom exportsVery low'~'

If you control the file source, switch to pipe or tab before the file leaves the producing system. If you receive the file from an external source and can't control its format, the text qualifier approach in the next section handles commas in data without changing the delimiter.

How to spot the delimiter in an unknown file

Open the file in a plain text editor (Notepad, VS Code) rather than Excel — Excel auto-detects the delimiter and hides it from you. Look at the first data row: the character that consistently separates your expected column count is the delimiter. If you count 5 commas in a row that should have 6 columns, but the 3rd field has an extra comma inside it, the file is comma-delimited and needs text qualifiers. If you count 5 pipes and the row looks correct, it is pipe-delimited.

Problem 2: Quotes and Text Qualifiers

The standard solution for commas inside field values is to wrap fields in double-quotes — the RFC 4180 convention that Excel and most modern export tools follow. A field like Widget Corp, Inc. becomes "Widget Corp, Inc." in the CSV, and a parser that respects text qualifiers treats everything between the opening and closing quote as a single field value, ignoring any commas inside.

SQL Server's handling of text qualifiers depends on the tool and version:

BULK INSERT with FORMAT = 'CSV' (SQL Server 2017+)

The FORMAT = 'CSV' option activates RFC 4180-compliant parsing — double-quoted fields are recognized and the quotes are stripped from the imported value. This is the cleanest path for modern SQL Server instances. On SQL Server 2016 and earlier, FORMAT = 'CSV' is not available; the workaround is to switch to a pipe or tab delimiter.

SSMS Import and Export Wizard

On the "Choose a Data Source" screen, after selecting Flat File Source, find the Text qualifier field in the General tab. Enter a double-quote ("). This tells the wizard to respect quoted fields. If you leave it blank, the wizard treats every double-quote as a literal character and will not handle commas inside quoted fields correctly.

SSIS Flat File Connection Manager

In the Advanced tab of the Flat File Connection Manager editor, there is a Text qualifier property. Set it to ". The connection manager then strips the qualifier characters and delivers clean field values to the Data Flow.

Literal double-quotes inside quoted fields must be escaped

RFC 4180 specifies that a literal double-quote character inside a quoted field must be represented as two consecutive double-quotes: "He said ""hello""" imports as He said "hello". A stray single double-quote that isn't part of a valid opening/closing pair — for example a value like 12" wrench — will cause the parser to treat everything after it as the continuation of an unclosed quoted field, trapping the rest of the file into one giant field value. Find these before importing using Find in a text editor: search for [^"]"[^"] (a quote not preceded or followed by another quote) to locate unescaped quotes.

Problem 3: Embedded Newlines

A row terminator tells SQL Server where one record ends and the next begins. In a standard CSV, each row ends with a newline character. The problem arises when a text field contains a literal newline — someone pressed Enter inside a "Notes" cell in Excel before saving. The resulting CSV has a line break in the middle of a field, and the import tool sees the second line as a new record.

The symptom is row count inflation: a file with 1,000 actual records imports as 1,400 rows, with many of those extra rows containing only 1 or 2 columns of data (the remainder of the split record). No import-time setting resolves this — the file itself must be pre-processed.

1

Identify that embedded newlines are the issue

Open the file in a plain text editor and look for lines that have fewer fields than the expected column count. If the row count in the text editor matches the file's stated record count but many lines are short, you have embedded newlines. A quick check: count the maximum columns per line and flag any line that has fewer than (expected column count - 1) as a continuation of the previous row.

2

Remove embedded newlines in Excel before saving

Open the file in Excel (before the CSV export), select the affected column(s), use Find & Replace (Ctrl+H), click in the Find field and press Ctrl+J (this inserts a newline character that is invisible but present), and leave the Replace field with a space or remove it entirely. This flattens multiline cells to a single line. Then re-export as CSV.

3

Remove embedded newlines in a text editor for large files

For large files that can't be opened in Excel without truncation, use a text editor with regex replace support (VS Code, Notepad++). Search for a newline that isn't followed by the expected number of delimiters for the start of a new record. VS Code regex: search for \n(?![^,]*(,[^,]*){N-1}$) where N is your expected column count. Replace with a space. For very large files, a PowerShell one-liner using Get-Content and regex is faster.

Problem 4: Row Terminator Variants

Even in files without embedded newlines, the row terminator character varies by operating system and producing tool. SQL Server is precise about this, and a mismatch between the file's actual line endings and the tool's expected terminator produces either a full import failure or — the harder-to-diagnose case — an apparent success where every row in the table has all its data crammed into the first column.

Line endingNameProduced byBULK INSERT ROWTERMINATOR
\r\nCRLF (Carriage Return + Line Feed)Windows, Excel, most Windows tools'0x0d0a' (recommended over '\r\n')
\nLF (Line Feed)Linux, macOS, Unix tools, Python csv module'0x0a'
\rCR onlyOld Mac OS (pre-OS X), some legacy systems'0x0d'

Use hex values for ROWTERMINATOR, not escape sequences

The string literal '\n' in T-SQL does not represent a LF character — it's a backslash followed by the letter n. SQL Server interprets it literally when used in ROWTERMINATOR, causing a mismatch on any file with actual LF line endings. Use the hex representation instead: '0x0a' for LF and '0x0d0a' for CRLF. The hex values work reliably across all SQL Server versions. See the full BULK INSERT option reference for all ROWTERMINATOR values.

Problem 5: Encoding

Character encoding determines how non-ASCII characters — accented letters, currency symbols, emoji, Chinese/Japanese/Korean characters — are stored as bytes in the file. When the encoding the file was saved in doesn't match the encoding the import tool expects to read, those characters appear as garbled sequences: an "é" becomes "é", or a degree symbol (°) becomes "°".

EncodingWhat it isSQL Server DATAFILETYPENotes
UTF-8 (no BOM)The modern standard — all Unicode characters, variable byte width'char' + code page 65001Most files from Linux tools, Python, APIs
UTF-8 with BOMUTF-8 with a 3-byte marker at the start of the file that identifies the encodingSSMS Wizard handles automatically; BULK INSERT: use 'char' + code page 65001 and strip BOM firstExcel "CSV UTF-8" saves this format
UTF-16 LEWindows native Unicode — 2 bytes per character'widechar'Common in Windows system exports, some SQL Server exports
Windows-1252Western European single-byte encoding — limited character set'char' (default)Legacy files from older Windows tools; fails on Chinese/Japanese/Korean characters

The safest workflow: always save exported CSVs as UTF-8. In Excel, use "CSV UTF-8 (comma delimited)" from the Save As format dropdown — not plain "CSV," which saves as Windows-1252. When importing into SQL Server, use NVARCHAR columns for any field that may contain non-ASCII characters, and specify DATAFILETYPE = 'widechar' if the file is UTF-16, or ensure the correct code page is active for UTF-8 files.

Import Wizard Settings Location

If you're using the SSMS Import and Export Wizard rather than writing T-SQL, all the settings described above map to specific fields in the wizard's first screen after selecting Flat File Source:

Wizard tabField nameWhat to set
GeneralCode pageMatch to the file encoding — 65001 for UTF-8, 1252 for Windows legacy
GeneralText qualifierEnter " if the file uses double-quotes to wrap fields
GeneralHeader row delimiterUsually {CR}{LF} for Windows files; {LF} for files from Linux/Mac
GeneralHeader rows to skipSet to 1 if the first row is a column header
ColumnsColumn delimiterMatch to the actual delimiter in the file — comma, pipe, tab, etc.
AdvancedOutputColumnWidth per columnSet to a generous value (4000) for text columns — the default 50 truncates long values silently

The Import Wizard's 200-row type sampling causes silent truncation

The SSMS Import Wizard samples the first 200 rows to determine column widths. If the widest value in a text column appears after row 200, the wizard sets OutputColumnWidth too small and silently truncates every value that exceeds it. Check the Advanced tab and set all text columns to 4000 manually before running the import. This is the same issue as the SSIS 100-row sampling limitation — both tools have the same root cause.

Stop diagnosing delimiter and encoding issues manually

Elvity detects the delimiter, text qualifier, line ending, and encoding of incoming CSV files automatically — and configures the import pipeline to match.