Home/Articles/CSV Ingestion with SSIS

Building a CSV Ingestion Engine with SQL Server Integration Services (SSIS)

SSIS is the right tool when you need data cleaning, error routing, and folder-level automation in a single visual pipeline — not just a faster version of the Import Wizard.

10 min read·SQL Server Automation

SQL Server offers three distinct approaches to importing CSV data, and understanding when to use each is more important than knowing how to use any one of them. BULK INSERT gives you maximum raw throughput — it's the fastest path from file to table when the data is clean and the schema is fixed. PowerShell with SQL Server Agent gives you scheduling, file system operations, and failure alerting without requiring additional tooling. SSIS solves a different problem: it handles the transformation that happens between the file and the table — type conversion, column mapping, row-level error routing, and multi-file folder processing — in a visual, reusable pipeline.

Choosing the right SQL Server import tool

RequirementBULK INSERTPowerShell + AgentSSIS
Maximum import speed✓ Best✓ Via BULK INSERT✓ Good (not fastest)
Scheduled recurring importManual or via Agent✓ Native✓ Via Agent job
Type conversion during load✗ No✗ No✓ Yes — Data Conversion task
Row-level error routing✗ Limited (ERRORFILE)✗ Limited✓ Red arrow redirection
Process entire folder of CSVs✗ One file per statement✓ Via Get-ChildItem loop✓ Foreach Loop Container
Visual pipeline editor✗ No✗ No✓ Yes
Reusable without re-configuring✗ Script-dependent✓ Script file reuse✓ .dtsx package
Required toolingSQL Server onlySSMS + SqlServer moduleSQL Server Data Tools (SSDT)

If type conversion and row-level error routing don't apply to your files, PowerShell with SQL Server Agent is simpler to deploy and maintain. SSIS pays off when the incoming data requires transformation before it can be safely written to a typed SQL Server table.

What SSIS Is (and Isn't)

SSIS is a visual ETL (Extract, Transform, Load) tool built into the SQL Server ecosystem. You design pipelines by dragging components onto a canvas and connecting them with arrows — no T-SQL or scripting required for common patterns. A completed SSIS pipeline is saved as a .dtsx package file that can be deployed to an SSIS Catalog on SQL Server and executed on a schedule via SQL Server Agent.

SSIS is not a replacement for BULK INSERT at the engine level. Internally, the OLE DB Destination component in an SSIS package executes bulk insert operations — but with the overhead of the SSIS runtime, connection management, and buffer processing layered on top. For pure throughput on clean data, a T-SQL BULK INSERT with TABLOCK and minimal logging will outperform SSIS. SSIS wins when the data isn't clean.

Installing SQL Server Data Tools (SSDT)

SSIS packages are designed in SQL Server Data Tools, a Visual Studio extension. In recent versions, SSDT is installed as the "SQL Server Integration Services Projects" extension from the Visual Studio Marketplace — search for it in Visual Studio → Extensions → Manage Extensions. SSDT is not included with SSMS. You need Visual Studio (Community edition is free) to design packages; you only need SSMS to deploy and schedule them.

Step 1: Connection Managers

Every SSIS package starts with Connection Managers — named, reusable definitions of where data comes from and where it goes. For a CSV-to-SQL-Server pipeline, you need exactly two: a Flat File connection for the source CSV and an OLE DB connection for the destination database.

Connection Manager configuration

Connection typeSettingWhat to enter
Flat File (CSV source)File namePath to a representative sample CSV — SSIS uses this file to detect column names and data types
Flat File (CSV source)Text qualifierDouble-quote (") — ensures quoted fields containing commas are treated as single values
Flat File (CSV source)Header row delimiter{CR}{LF} for Windows line endings, {LF} for Linux/Mac-sourced files
Flat File (CSV source)Column delimiterComma — or Tab/Pipe for non-standard CSVs
Flat File (CSV source)Column names in first rowChecked — SSIS reads the header row and uses column names in the mapping step
Flat File → Advanced tabData type per columnReview and override — SSIS samples rows to guess types but can guess wrong for sparse or mixed columns
OLE DB (SQL destination)Server nameSQL Server instance name or IP address
OLE DB (SQL destination)Database nameThe database containing the destination table
OLE DB (SQL destination)AuthenticationWindows Authentication or SQL Server login — the account needs INSERT on the destination table

SSIS samples only the first 100–200 rows to detect types

Type detection in the Flat File Connection Manager Advanced tab is based on a sample of the first rows in the file. If a column is mostly empty in the sample but contains long strings later in the file, SSIS will infer DT_STR(50) and truncate everything longer than 50 characters during the load. Check every column width in the Advanced tab before building the Data Flow. When unsure, set string columns to DT_STR(4000) and let the destination table enforce the actual constraint.

Step 2: The Data Flow Task

The Control Flow is where you sequence tasks — "do A, then B, then C." The Data Flow is where data actually moves. In SSIS, you drag a Data Flow Task onto the Control Flow canvas, then double-click it to enter the Data Flow editor, which has its own set of components.

Core Data Flow components for CSV ingestion

ComponentRoleWhen to include
Flat File SourceReads rows from the CSV using the Flat File Connection ManagerAlways — this is the starting point
Data ConversionConverts SSIS stream types to SQL Server-compatible types (e.g., DT_STR → DT_WSTR, string → date)When source column types don't match the destination table schema
Derived ColumnCalculates new columns from expressions — trim whitespace, concatenate, replace nulls, parse substringsWhen the CSV has raw values that need formatting before landing in SQL
Conditional SplitRoutes rows to different outputs based on a condition — like a WHERE clause applied during the loadWhen some rows go to one table and others go to another, or when filtering bad rows out early
LookupJoins each incoming row against a reference table in SQL Server — returns a match or a no-match outputWhen you need to validate or enrich rows against existing database data during the load
OLE DB DestinationWrites rows into the SQL Server destination tableAlways — this is the ending point
Flat File Destination (error)Writes rejected rows to a separate CSV for reviewWhen you connect the red error output of any component to this destination

The arrow between components is not just a connector — it's a typed pipeline buffer. SSIS streams data through memory in fixed-size buffers, and the columns available to each downstream component depend on what the upstream components expose. If a component produces a new column (like Derived Column), the next component can use it. If a component converts a column type (like Data Conversion), the converted version appears under a new name and the original remains available too. Map the new names — not the originals — to the OLE DB Destination.

Step 3: Data Conversion

CSV files carry all values as strings. SQL Server tables carry typed values — integers, decimals, dates, bit flags. The Data Conversion component sits between the Flat File Source and the OLE DB Destination and converts string columns to their target types before the row is written.

Common CSV-to-SQL-Server type conversions

CSV value exampleSSIS source typeTarget SSIS typeSQL Server type
"42"DT_STRDT_I4INT
"19.99"DT_STRDT_NUMERIC (18, 2)DECIMAL(18,2)
"2024-10-27"DT_STRDT_DBDATEDATE
"2024-10-27 08:30:00"DT_STRDT_DBTIMESTAMPDATETIME2
"true" / "false"DT_STRDT_BOOLBIT
"Hello World"DT_STR (ASCII)DT_WSTR (Unicode)NVARCHAR(N)
"" (empty string)DT_STRDerived Column → NULL replacementNULL in nullable column

Data Conversion creates new columns — map those, not the originals

The Data Conversion component doesn't replace columns — it outputs a new column with a generated name (e.g., Copy of OrderDate) alongside the original. In the OLE DB Destination's column mapping, select the converted copy for each transformed column, not the original string column. Mapping the original against a typed SQL column causes a type mismatch error at runtime.

Step 4: Error Row Redirection

Every SSIS Data Flow component has at least one output path. Most have two: a green success arrow (the main pipeline) and a red error arrow. By default, the red error output is disabled — any row that fails type conversion or a constraint check fails the entire package. Connecting the red arrow to a Flat File Destination changes this: failed rows are written to a separate file and the successful rows continue to the destination table.

Green arrow (output path)

The main data pipeline. Rows that pass all component conditions travel down this path to the next component. This is the only arrow that exists by default.

Red arrow (error output path)

Carries rows that produced an error in the upstream component — a type mismatch, a truncation, a NULL in a non-nullable column. To activate it: right-click the component → Edit → Error Output tab → set the action for each column's error and truncation to "Redirect Row" instead of "Fail Component." Then drag the red arrow to a Flat File Destination configured to write to an error log CSV.

Flat File Destination (error log)

The end point for redirected bad rows. Configure it with a separate Flat File Connection Manager pointing to an error log path (e.g., C:\Logs\ImportErrors.csv). SSIS appends two extra columns to each redirected row: ErrorCode (an integer identifying the error type) and ErrorColumn (the column ID where the error occurred). Use a Derived Column component upstream to add a timestamp before the error destination to make the log more readable.

The staging table pattern works well alongside SSIS error redirection: import all rows that pass the SSIS type checks into a staging table (with no constraints), then run validation queries to catch business-rule violations before promoting data to the production table. The staging table and schema evolution guide covers this pattern.

Step 5: Processing an Entire Folder with Foreach Loop

The Foreach Loop Container is a Control Flow component (not a Data Flow component) that iterates over a collection — a folder of files, a list of values, a recordset — and runs whatever tasks are inside it once for each item. For CSV ingestion, you configure it to enumerate all *.csv files in a folder, store the current file path in a package variable, and have the Data Flow Task's Flat File Connection Manager read its path from that variable instead of a hard-coded string.

1

Add a package variable

In SSDT, go to SSIS menu → Variables → New Variable. Name it CurrentFilePath, set the type to String, and leave the value blank. This variable will hold the path of the current CSV file on each iteration of the loop.

2

Configure the Foreach Loop Container

Drag a Foreach Loop Container onto the Control Flow canvas. Double-click it → Collection tab → Enumerator: Foreach File Enumerator. Set the Folder to your input directory and the Files field to *.csv. On the Variable Mappings tab, map the result (index 0) to the CurrentFilePath variable. The loop now sets CurrentFilePath to each file path before running its contents.

3

Update the Flat File Connection Manager to use the variable

Right-click the Flat File Connection Manager → Properties → Expressions → click "..." next to Expressions → add a new expression. Set the property to ConnectionString and the expression to @[User::CurrentFilePath]. Now on each loop iteration, the connection manager resolves to the current file path rather than the original sample file path.

4

Move the Data Flow Task inside the loop

Drag the Data Flow Task into the Foreach Loop Container (the container expands to contain it). The loop now runs the full Data Flow pipeline — Flat File Source → Data Conversion → OLE DB Destination — once for every CSV found in the folder. Add a File System Task after the Data Flow Task inside the loop to move each processed file to an archive folder, exactly as the PowerShell pipeline does.

Column headers must be consistent across all files in the folder

The Flat File Connection Manager reads its column definitions from the sample file you pointed it to during setup. If a different CSV in the folder has a different column order, different column names, or extra columns, SSIS will either silently map data to the wrong columns or fail with a column mismatch error. The Foreach Loop pattern works reliably when all files share a known, fixed schema. For files with varying schemas, use a Conditional Split to route files by type, or pre-process the headers in PowerShell before SSIS runs.

Deploying and Scheduling the Package

A .dtsx package file created in SSDT can be deployed to the SSIS Catalog (SSISDB) on SQL Server and then scheduled via a SQL Server Agent job step of type "SQL Server Integration Services Package." This means the same Agent infrastructure used for PowerShell automation also schedules SSIS packages — the scheduling interface is identical.

Package deployment requires the SSIS Catalog feature to be enabled on the SQL Server instance (SSMS → Integration Services Catalogs → SSISDB — if it doesn't exist, right-click to create it). Once deployed, package parameters (file paths, server names, database names) can be overridden at execution time or stored in SQL Server Agent job step configurations, making it straightforward to run the same package against a development or production folder without modifying the package itself.

When SSIS is worth the setup cost

SSIS has a steeper initial setup than a PowerShell script — Visual Studio, SSDT extension, package design, SSIS Catalog deployment. That setup pays off when: the incoming data requires type conversion or enrichment (Derived Column, Lookup); bad rows must be isolated without stopping the import (error redirection); or the pipeline will be maintained by someone without a scripting background (visual editor, no code). For clean data on a fixed schema with no transformation needs, PowerShell + BULK INSERT is simpler to maintain long-term.

Skip the pipeline design, keep the reliability

Elvity handles type detection, error routing, staging, and scheduling automatically — without SSDT, SSIS packages, or manual connection manager configuration.