Name	Name	Last commit message	Last commit date
parent directory ..
SqlPipeline	SqlPipeline
Tests	Tests
Readme.md	Readme.md

Introduction

Wrapper for SimplySql and DuckDB to allow pipeline input and set the parameters automatically and it accepts also PSCustomObject input. It supports all the supported databases from SimplySql, but examples here are made with SQLite.

Note: To use DuckDB, run Install-SqlPipeline first to install the required dependencies.

DuckDB Integration

The module includes a native DuckDB pipeline that works independently of SimplySql. An in-memory DuckDB database is initialized automatically when the module is imported — no setup required for quick analysis or temporary storage.

Current functionality:

DuckDB connection management
Automatic table creation from PSObject data
Schema evolution (new fields via ALTER TABLE)
Normalization of missing fields (columns no longer provided)
Bulk insert via Appender (fast) and temporary CSV import (even faster)
UPSERT via staging table + INSERT ON CONFLICT
Metadata management (last load timestamp per table)

Installation

Install the required DuckDB.NET NuGet packages into a ./lib subfolder:

# PowerShell 7+ (latest DuckDB.NET)
Install-SqlPipeline

# Windows PowerShell 5.1 (pinned compatible versions)
Install-SqlPipeline -WindowsPowerShell

PowerShell Version Compatibility

Feature	PowerShell 7+	Windows PowerShell 5.1
DuckDB.NET version	latest	1.4.4 (maximum)
Extra dependencies	none	`System.Memory` 4.6.0

Windows PowerShell 5.1 runs on .NET Framework 4.x, which is missing some APIs that newer DuckDB.NET versions require. Use the -WindowsPowerShell switch when installing on Windows PowerShell 5.1:

Install-SqlPipeline -WindowsPowerShell

This installs:

DuckDB.NET.Bindings.Full 1.4.4
DuckDB.NET.Data.Full 1.4.4
System.Memory 4.6.0 (required polyfill not included in .NET Framework)

Quick Start (in-memory, no setup needed)

Import-Module SqlPipeline

# Data is written to an in-memory DuckDB database automatically
Import-Csv '.\orders.csv' | Add-RowsToDuckDB -TableName 'orders' -PKColumns 'order_id' -Verbose

# Query back the data
Get-DuckDBData -Query "SELECT COUNT(*) AS total FROM orders"

File-based Database

Call Initialize-SQLPipeline to switch from the default in-memory database to a persistent file. All subsequent DuckDB functions will use the file-based connection automatically.

Import-Module SqlPipeline

Initialize-SQLPipeline -DbPath '.\pipeline.db'

Import-Csv '.\orders.csv' | Add-RowsToDuckDB -TableName 'orders' -PKColumns 'order_id' -UseTransaction -Verbose

Close-SqlPipeline

Public Functions

`Initialize-SQLPipeline`

Opens a persistent file-based DuckDB database and sets it as the default connection. Only needed when you want to persist data to a file — the in-memory database is initialized automatically on module import.

# Switch to a file-based database
Initialize-SQLPipeline -DbPath '.\pipeline.db'

# With AES-256 encryption (requires DuckDB 1.4.0+)
$conn = Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret-key'

Parameter	Type	Description
`DbPath`	string (mandatory)	Path to the `.db` file. Created if it does not exist.
`EncryptionKey`	string	Optional AES-256 encryption key. Encryption is applied via `ATTACH ... (ENCRYPTION_KEY '...')`.
`EncryptionCipher`	string	`GCM` (default, authenticated) or `CTR` (faster, no integrity check). Only used when `EncryptionKey` is set.

Returns the DuckDBConnection object. Also sets $Script:DefaultConnection so all functions work without -Connection.

`Add-RowsToDuckDB`

Inserts PSObjects into a DuckDB table via the PowerShell pipeline. Compatible with the Add-RowsToSql interface. Creates the table automatically on the first insert and evolves the schema when new columns appear.

# Plain INSERT (in-memory, no connection needed)
Import-Csv '.\orders.csv' | Add-RowsToDuckDB -TableName 'orders'

# UPSERT with primary key
Import-Csv '.\orders.csv' | Add-RowsToDuckDB -TableName 'orders' -PKColumns 'order_id'

# Transaction-style batching for large datasets
Import-Csv '.\orders.csv' | Add-RowsToDuckDB -TableName 'orders' -PKColumns 'order_id' -UseTransaction

# Explicit connection
Import-Csv '.\orders.csv' | Add-RowsToDuckDB -Connection $conn -TableName 'orders'

Parameter	Type	Description
`InputObject`	PSObject (pipeline)	Row to insert.
`Connection`	DuckDBConnection	Connection to use. Defaults to the module's default connection.
`TableName`	string (mandatory)	Target table name. Created automatically if it does not exist.
`PKColumns`	string[]	Primary key columns for UPSERT. Empty = plain INSERT.
`UseTransaction`	switch	Buffers all rows and writes via staging table (safer for large loads).
`BatchSize`	int	Rows per batch when not using `-UseTransaction`. Default: 10000.

`Get-DuckDBData`

Executes a SELECT query and returns a DataTable.

# Using the default connection
Get-DuckDBData -Query "SELECT * FROM orders WHERE status = 'open'"

# With explicit connection
Get-DuckDBData -Connection $conn -Query "SELECT COUNT(*) AS cnt FROM orders"

Parameter	Type	Description
`Connection`	DuckDBConnection	Connection to use. Defaults to the module's default connection.
`Query`	string (mandatory)	SQL SELECT statement to execute.

`Invoke-DuckDBQuery`

Executes a non-query SQL statement (CREATE, INSERT, ALTER, DROP, ...).

Invoke-DuckDBQuery -Query "DELETE FROM orders WHERE status = 'cancelled'"
Invoke-DuckDBQuery -Query "ALTER TABLE orders ADD COLUMN discount DOUBLE"

Parameter	Type	Description
`Connection`	DuckDBConnection	Connection to use. Defaults to the module's default connection.
`Query`	string (mandatory)	SQL statement to execute.

`Get-LastLoadTimestamp`

Returns the timestamp of the last successful load for a table. Returns 2000-01-01 if no previous load is recorded (signals a full load). Used for incremental loading patterns.

$since = Get-LastLoadTimestamp -TableName 'orders'
$newOrders = Get-ApiData -ModifiedAfter $since
$newOrders | Add-RowsToDuckDB -TableName 'orders' -PKColumns 'order_id'
Set-LoadMetadata -TableName 'orders' -RowsLoaded $newOrders.Count

Parameter	Type	Description
`Connection`	DuckDBConnection	Connection to use. Defaults to the module's default connection.
`TableName`	string (mandatory)	Table name to look up.

`Set-LoadMetadata`

Stores the timestamp, row count, and status of a completed load in the _load_metadata table. Used together with Get-LastLoadTimestamp to implement incremental loads.

Set-LoadMetadata -TableName 'orders' -RowsLoaded 1500
Set-LoadMetadata -TableName 'orders' -RowsLoaded 0 -Status 'error' -ErrorMessage 'API timeout'

Parameter	Type	Description
`Connection`	DuckDBConnection	Connection to use. Defaults to the module's default connection.
`TableName`	string (mandatory)	Name of the loaded table.
`RowsLoaded`	int (mandatory)	Number of rows loaded.
`Status`	string	`'success'` (default) or `'error'`.
`ErrorMessage`	string	Optional error description when `Status = 'error'`.

`Export-DuckDBToParquet`

Exports a DuckDB table to a Parquet file. The output directory is created automatically if it does not exist.

Export-DuckDBToParquet -TableName 'orders' -OutputPath '.\export\orders.parquet'
Export-DuckDBToParquet -TableName 'orders' -OutputPath '.\export\orders.parquet' -Compression ZSTD

Parameter	Type	Description
`Connection`	DuckDBConnection	Connection to use. Defaults to the module's default connection.
`TableName`	string (mandatory)	Table to export.
`OutputPath`	string (mandatory)	Path for the output `.parquet` file.
`Compression`	string	`SNAPPY`, `ZSTD` (default), `GZIP`, or `NONE`.

`Close-SqlPipeline`

Closes a DuckDB connection cleanly. Use this when working with file-based databases opened via Initialize-SQLPipeline.

# Close an explicit connection
Close-SqlPipeline -Connection $conn

# Close the default connection
Close-SqlPipeline

Parameter	Type	Description
`Connection`	DuckDBConnection (mandatory)	The connection to close.

Full Incremental Load Example

Import-Module SqlPipeline

# Switch to a persistent database
Initialize-SQLPipeline -DbPath '.\crm.db'

# Find out when we last loaded successfully
$since = Get-LastLoadTimestamp -TableName 'contacts'

# Fetch only records modified since the last load
$contacts = Invoke-RestMethod "https://api.crm.example/contacts?modifiedAfter=$since"

# Upsert into DuckDB
$contacts.items | Add-RowsToDuckDB -TableName 'contacts' -PKColumns 'id' -UseTransaction -Verbose

# Record success
Set-LoadMetadata -TableName 'contacts' -RowsLoaded $contacts.items.Count

# Export for downstream consumption
Export-DuckDBToParquet -TableName 'contacts' -OutputPath '.\export\contacts.parquet'

Close-SqlPipeline

Database Encryption

SqlPipeline supports AES-256 encrypted DuckDB databases (requires DuckDB 1.4.0 or later). Pass -EncryptionKey to Initialize-SQLPipeline — everything else works identically to an unencrypted database.

Import-Module SqlPipeline

# Open (or create) an encrypted file-based database
Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret-key'

Import-Csv '.\orders.csv' | Add-RowsToDuckDB -TableName 'orders' -PKColumns 'order_id'

Close-SqlPipeline

By default AES-GCM-256 is used (authenticated encryption). To use AES-CTR-256 instead (faster, no integrity check):

Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret-key' -EncryptionCipher CTR

Note: DuckDB 1.4.1+ requires the httpfs extension (OpenSSL) for writes to encrypted databases. SqlPipeline installs and loads it automatically when -EncryptionKey is provided.

Migrating an Existing Unencrypted Database to an Encrypted One

The module's default in-memory connection can be used as a bridge to attach both the source and destination databases simultaneously and copy all tables in one step.

Import-Module SqlPipeline

# DuckDB 1.4.1+ requires httpfs (OpenSSL) for writes to encrypted databases.
Invoke-DuckDBQuery -Query "INSTALL httpfs"
Invoke-DuckDBQuery -Query "LOAD httpfs"

# Attach the existing unencrypted database as the source.
Invoke-DuckDBQuery -Query "ATTACH '.\pipeline.db' AS src"

# Attach the new encrypted database as the destination (created automatically).
Invoke-DuckDBQuery -Query "ATTACH '.\pipeline_encrypted.db' AS dst (ENCRYPTION_KEY 'my-secret-key', ENCRYPTION_CIPHER 'GCM')"

# Copy all tables and their data from source to destination in one step.
Invoke-DuckDBQuery -Query "COPY FROM DATABASE src TO dst"

# Detach both databases cleanly.
Invoke-DuckDBQuery -Query "DETACH src"
Invoke-DuckDBQuery -Query "DETACH dst"

After verifying the encrypted database works correctly, replace the original file:

Remove-Item      '.\pipeline.db'
Rename-Item      '.\pipeline_encrypted.db' '.\pipeline.db'

From this point on open the database with -EncryptionKey:

Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret-key'

SimplySQL Integration

To use the SimplySQL integration that allows connections to SQLServer, sqlite, postgresql and more, follow these steps.

Performance

Please think about using the -UseTransaction flag. E.g. for a Sqlite import and a file of around 45k rows you can see the difference here. I have measured it like in this command:

Import-Module SqlPipeline, SimplySql
Open-SQLiteConnection -DataSource ".\db.sqlite"
Measure-Command {
    import-csv -Path '.ac_adressen.csv' -Encoding UTF8 -Delimiter "," | Add-RowsToSql -TableName "addresses" -UseTransaction -Verbose
}
Close-SqlConnection

And you can see the difference, it is around 10x times faster to use a transaction when inserting data

Type	Time
Without transaction	596s
With transaction	20s

You can influence the number of records per transaction via the -CommitEvery parameter where the default is 10.000.

Btw. opening a temporary in-memory database like

Open-SQLiteConnection -DataSource ":memory:"

made my import then in 18 seconds, which is still a nice improvement!

I have imported another file with the size of 1.5GB and 300k rows in 216 seconds. Every line consists of columns with a total line length of (including tabs as delimiter).

Examples

All the examples are done with SQLite, but are also valid for all other databases that are supported in SimplySql.

Close the connection after everything is done

With the -CloseConnection you can close the connection automatically after the import is done:

Open-SQLiteConnection -DataSource ":memory:"
import-csv -Path '.ac_adressen.csv' -Encoding UTF8 -Delimiter "," | Select-Object -first 100 | Add-RowsToSql -verbose -TableName "addresses" -UseTransaction -CloseConnection

So then Test-SqlConnection should be false

Use the input data from csv and in further steps in pipeline

Use the -PassThru parameter switch to forward the input object to the next pipeline step, so this could be something like showing the first 100 records and output it in a table

Open-SQLiteConnection -DataSource ".\addresses.sqlite"
import-csv -Path '.ac_adressen.csv' -Encoding UTF8 -Delimiter "," | Select-Object -first 100 | Add-RowsToSql -TableName "addresses" -PassThru | Out-GridView

Using in-memory database

This is also supported with SQLite, which can be done through SimplySql. So just open the database connection like

Open-SQLiteConnection -DataSource ":memory:"

and it will be automatically used

Input objects directly

So you can create objects like you want and store them in the database like in here. This example shows a lot of different use cases, so it can look like a bit mixed up, but shows you the flexibility.

$psCustoms1 = @(
    [PSCustomObject]@{
        "firstname" = "Florian"
        "lastname" = "von Bracht"
        "score" = 10
        "object" = [PSCustomObject]@{
            
        }
    }
    [PSCustomObject]@{
        "firstname" = "Florian"
        #"lastname" = "von Bracht"
        "score" = 10
        "object" = [Hashtable]@{
            
        }
    }
)

$psCustoms2 = @(
    [Hashtable]@{
        "first name" = "Bat"
        "lastname" = "Man"
        "score" = 11
        "object" = [PSCustomObject]@{
            "street" = "Kaiserstrasse 35"
            "city" = "Frankfurt"
        }
        "active" = "true" # test $true
    }
)

Import-Module SqlPipeline
Open-SQLiteConnection -DataSource ":memory:"
Add-RowsToSql -InputObjects $psCustoms1 -TableName pscustoms -UseTransaction -FormatObjectAsJson -verbose
$psCustoms2 | Add-RowsToSql -TableName pscustoms -UseTransaction -FormatObjectAsJson -verbose -CreateColumnsInExistingTable
Invoke-SqlQuery -Query "Select * from pscustoms" | Format-Table
Close-SqlConnection

Input other objects than Hashtable or PSCustomObject

You can deactivate the input validation with the -IgnoreInputValidation flag, e.g. if you are reading some systeminfo. But please be beware, this can cause that something isn't working as expected.

Get-ChildItem "*.*" | Add-RowsToSql -TableName "childitem" -UseTransaction -IgnoreInputValidation -verbose

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Introduction

DuckDB Integration

Installation

PowerShell Version Compatibility

Quick Start (in-memory, no setup needed)

File-based Database

Public Functions

`Initialize-SQLPipeline`

`Add-RowsToDuckDB`

`Get-DuckDBData`

`Invoke-DuckDBQuery`

`Get-LastLoadTimestamp`

`Set-LoadMetadata`

`Export-DuckDBToParquet`

`Close-SqlPipeline`

Full Incremental Load Example

Database Encryption

Migrating an Existing Unencrypted Database to an Encrypted One

SimplySQL Integration

Performance

Examples

Close the connection after everything is done

Use the input data from csv and in further steps in pipeline

Using in-memory database

Input objects directly

Input other objects than Hashtable or PSCustomObject

FilesExpand file tree

SqlPipeline

Directory actions

More options

Directory actions

More options

Latest commit

History

SqlPipeline

Folders and files

parent directory

Readme.md

Introduction

DuckDB Integration

Installation

PowerShell Version Compatibility

Quick Start (in-memory, no setup needed)

File-based Database

Public Functions

Initialize-SQLPipeline

Add-RowsToDuckDB

Get-DuckDBData

Invoke-DuckDBQuery

Get-LastLoadTimestamp

Set-LoadMetadata

Export-DuckDBToParquet

Close-SqlPipeline

Full Incremental Load Example

Database Encryption

Migrating an Existing Unencrypted Database to an Encrypted One

SimplySQL Integration

Performance

Examples

Close the connection after everything is done

Use the input data from csv and in further steps in pipeline

Using in-memory database

Input objects directly

Input other objects than Hashtable or PSCustomObject

`Initialize-SQLPipeline`

`Add-RowsToDuckDB`

`Get-DuckDBData`

`Invoke-DuckDBQuery`

`Get-LastLoadTimestamp`

`Set-LoadMetadata`

`Export-DuckDBToParquet`

`Close-SqlPipeline`