Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 65 additions & 1 deletion SqlPipeline/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,8 @@ $conn = Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret
| Parameter | Type | Description |
|---|---|---|
| `DbPath` | string (mandatory) | Path to the `.db` file. Created if it does not exist. |
| `EncryptionKey` | string | Optional AES-256 encryption key. |
| `EncryptionKey` | string | Optional AES-256 encryption key. Encryption is applied via `ATTACH ... (ENCRYPTION_KEY '...')`. |
| `EncryptionCipher` | string | `GCM` (default, authenticated) or `CTR` (faster, no integrity check). Only used when `EncryptionKey` is set. |

Returns the `DuckDBConnection` object. Also sets `$Script:DefaultConnection` so all functions work without `-Connection`.

Expand Down Expand Up @@ -261,6 +262,69 @@ Export-DuckDBToParquet -TableName 'contacts' -OutputPath '.\export\contacts.parq
Close-SqlPipeline
```

## Database Encryption

SqlPipeline supports AES-256 encrypted DuckDB databases (requires DuckDB 1.4.0 or later). Pass `-EncryptionKey` to `Initialize-SQLPipeline` — everything else works identically to an unencrypted database.

```PowerShell
Import-Module SqlPipeline

# Open (or create) an encrypted file-based database
Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret-key'

Import-Csv '.\orders.csv' | Add-RowsToDuckDB -TableName 'orders' -PKColumns 'order_id'

Close-SqlPipeline
```

By default AES-GCM-256 is used (authenticated encryption). To use AES-CTR-256 instead (faster, no integrity check):

```PowerShell
Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret-key' -EncryptionCipher CTR
```

> **Note:** DuckDB 1.4.1+ requires the `httpfs` extension (OpenSSL) for writes to encrypted databases. SqlPipeline installs and loads it automatically when `-EncryptionKey` is provided.

### Migrating an Existing Unencrypted Database to an Encrypted One

The module's default in-memory connection can be used as a bridge to attach both the source and destination databases simultaneously and copy all tables in one step.

```PowerShell
Import-Module SqlPipeline

# DuckDB 1.4.1+ requires httpfs (OpenSSL) for writes to encrypted databases.
Invoke-DuckDBQuery -Query "INSTALL httpfs"
Invoke-DuckDBQuery -Query "LOAD httpfs"

# Attach the existing unencrypted database as the source.
Invoke-DuckDBQuery -Query "ATTACH '.\pipeline.db' AS src"

# Attach the new encrypted database as the destination (created automatically).
Invoke-DuckDBQuery -Query "ATTACH '.\pipeline_encrypted.db' AS dst (ENCRYPTION_KEY 'my-secret-key', ENCRYPTION_CIPHER 'GCM')"

# Copy all tables and their data from source to destination in one step.
Invoke-DuckDBQuery -Query "COPY FROM DATABASE src TO dst"

# Detach both databases cleanly.
Invoke-DuckDBQuery -Query "DETACH src"
Invoke-DuckDBQuery -Query "DETACH dst"
```

After verifying the encrypted database works correctly, replace the original file:

```PowerShell
Remove-Item '.\pipeline.db'
Rename-Item '.\pipeline_encrypted.db' '.\pipeline.db'
```

From this point on open the database with `-EncryptionKey`:

```PowerShell
Initialize-SQLPipeline -DbPath '.\pipeline.db' -EncryptionKey 'my-secret-key'
```

---

# SimplySQL Integration

To use the SimplySQL integration that allows connections to SQLServer, sqlite, postgresql and more, follow these steps.
Expand Down
102 changes: 69 additions & 33 deletions SqlPipeline/SqlPipeline/Private/duckdb/Write-DuckDBAppender.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,68 @@ function Write-DuckDBAppender {
$schemaCmd.Dispose()

$appender = $Connection.CreateAppender($TableName)
$propNames = $null # cached once from first row
$propNames = $null # column names, cached from first row
$colAction = $null # [int[]] per-column action: 0=passthrough 1=float-col 2=int-col 3=varchar-col
$complexCols = $null # HashSet of columns that hold complex types (null when SimpleTypesOnly)

$i = 0
foreach ($row in $Data) {
$i++

# Cache property names from first row only
# On the first row: cache propNames, pre-compute per-column schema actions,
# and (unless SimpleTypesOnly) scan for complex-typed columns.
# This moves all hashtable lookups and string comparisons out of the hot path.
if ($null -eq $propNames) {
$propNames = @($row.PSObject.Properties.Name)

# Encode schema coercion rules as integers so the inner loop only needs
# an array index + switch(int) — no hashtable lookups or string compares.
# 0 = no schema entry / BOOLEAN / other → passthrough
# 1 = float column (DOUBLE / FLOAT / REAL / FLOAT4 / FLOAT8)
# 2 = int column (BIGINT / INTEGER / HUGEINT / INT8 / INT4)
# 3 = varchar column (VARCHAR)
$colAction = [int[]]::new($propNames.Count)
for ($ci = 0; $ci -lt $propNames.Count; $ci++) {
$ct = $columnTypes[$propNames[$ci]]
if ($null -ne $ct) {
if ($ct -eq 'DOUBLE' -or $ct -eq 'FLOAT' -or $ct -eq 'REAL' -or $ct -eq 'FLOAT4' -or $ct -eq 'FLOAT8') {
$colAction[$ci] = 1
} elseif ($ct -eq 'BIGINT' -or $ct -eq 'INTEGER' -or $ct -eq 'HUGEINT' -or $ct -eq 'INT8' -or $ct -eq 'INT4') {
$colAction[$ci] = 2
} elseif ($ct -eq 'VARCHAR') {
$colAction[$ci] = 3
}
}
}

# For non-SimpleTypesOnly: record which columns carry complex objects on
# the first row so subsequent rows only call -is/ConvertTo-Json on those.
if (-not $SimpleTypesOnly) {
$complexCols = [System.Collections.Generic.HashSet[string]]::new()
foreach ($name in $propNames) {
$v = $row.$name
if ($null -ne $v -and (
$v -is [System.Collections.IList] -or
$v -is [PSCustomObject] -or
$v -is [System.Collections.IDictionary])) {
[void]$complexCols.Add($name)
}
}
}
}

$appenderRow = $appender.CreateRow()
foreach ($name in $propNames) {
$val = $row.$name
for ($ci = 0; $ci -lt $propNames.Count; $ci++) {
$val = $row.($propNames[$ci])

if ($null -eq $val) {
# AppendValue([DBNull]::Value) has wrong overload resolution on typed
# columns (e.g. resolves to AppendValue(bool) for DOUBLE). Use the
# dedicated AppendNullValue() method instead.
[void]$appenderRow.AppendNullValue()
continue
}

# Normalize integer subtypes to Int64 before any other check,
# because DuckDB.NET appender has no Int32 overload and PowerShell
# would otherwise fall back to AppendValue(string).
Expand All @@ -51,44 +99,32 @@ function Write-DuckDBAppender {
# [long] → DOUBLE reinterprets raw bytes (15 becomes 7.4e-323)
# [bool] → BIGINT throws "Cannot write Boolean to BigInt column"
# [long] → VARCHAR throws "Cannot write Int64 to Varchar column"
if ($null -ne $val -and $columnTypes.ContainsKey($name)) {
$colType = $columnTypes[$name]
$isFloat = $colType -eq 'DOUBLE' -or $colType -eq 'FLOAT' -or
$colType -eq 'REAL' -or $colType -eq 'FLOAT4' -or $colType -eq 'FLOAT8'
$isInt = $colType -eq 'BIGINT' -or $colType -eq 'INTEGER' -or
$colType -eq 'HUGEINT' -or $colType -eq 'INT8' -or $colType -eq 'INT4'

if ($val -is [bool] -and $colType -ne 'BOOLEAN') {
# bool cannot be appended to non-BOOLEAN columns
if ($isFloat) { $val = [double][int]$val }
elseif ($isInt) { $val = [long][int]$val }
else { $val = [string]$val }
} elseif ($val -is [long] -and $isFloat) {
$val = [double]$val
} elseif ($val -is [double] -and $isInt) {
$val = [long]$val
} elseif ($colType -eq 'VARCHAR' -and ($val -is [long] -or $val -is [double])) {
$val = [string]$val
switch ($colAction[$ci]) {
1 { # float column
if ($val -is [bool]) { $val = [double][int]$val }
elseif ($val -is [long]) { $val = [double]$val }
}
2 { # int column
if ($val -is [bool]) { $val = [long][int]$val }
elseif ($val -is [double]) { $val = [long]$val }
}
3 { # varchar column
if ($val -is [long] -or $val -is [double]) { $val = [string]$val }
}
}
# Inlined ConvertTo-DuckDBValue
if ($null -eq $val) {
# AppendValue([DBNull]::Value) has wrong overload resolution on typed
# columns (e.g. resolves to AppendValue(bool) for DOUBLE). Use the
# dedicated AppendNullValue() method instead.
[void]$appenderRow.AppendNullValue()
} elseif (-not $SimpleTypesOnly -and (
$val -is [System.Collections.IList] -or
$val -is [PSCustomObject] -or
$val -is [System.Collections.IDictionary])) {

if ($null -ne $complexCols -and $complexCols.Contains($propNames[$ci]) -and (
$val -is [System.Collections.IList] -or
$val -is [PSCustomObject] -or
$val -is [System.Collections.IDictionary])) {
[void]$appenderRow.AppendValue((ConvertTo-Json -InputObject $val -Compress -Depth 10))
} else {
[void]$appenderRow.AppendValue($val)
}
}
$appenderRow.EndRow()

If ( $i % 10000 -eq 0 ) {
if ($i % 100 -eq 0) {
Write-Verbose "[$TableName] Appender: Row $i written."
}
}
Expand Down
65 changes: 42 additions & 23 deletions SqlPipeline/SqlPipeline/Private/duckdb/Write-DuckDBCsv.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -22,31 +22,50 @@ function Write-DuckDBCsv {
[switch]$SimpleTypesOnly = $false
)

# Pre-serialize complex objects to JSON so Export-Csv writes them as plain strings
$propNames = $null
$i = 0
$preparedData = foreach ($row in $Data) {
if ($null -eq $propNames) {
$propNames = @($row.PSObject.Properties.Name)
}
$ht = [ordered]@{}
foreach ($name in $propNames) {
$val = $row.$name
$ht[$name] = if ($null -eq $val) {
$null
} elseif (-not $SimpleTypesOnly -and (
$val -is [System.Collections.IList] -or
$val -is [PSCustomObject] -or
$val -is [System.Collections.IDictionary])) {
ConvertTo-Json -InputObject $val -Compress -Depth 10
# Pre-serialize complex objects to JSON so Export-Csv writes them as plain strings.
# SimpleTypesOnly: skip the loop entirely — no transformation needed.
# Otherwise: analyse the first row to find which columns hold complex types, then
# only run type checks + ConvertTo-Json on those columns for every subsequent row.
if ($SimpleTypesOnly) {
$preparedData = $Data
} else {
$propNames = $null
$complexCols = $null # HashSet of column names that need JSON serialisation
$i = 0
$preparedData = foreach ($row in $Data) {
if ($null -eq $propNames) {
$propNames = @($row.PSObject.Properties.Name)
$complexCols = [System.Collections.Generic.HashSet[string]]::new()
foreach ($name in $propNames) {
$val = $row.$name
if ($null -ne $val -and (
$val -is [System.Collections.IList] -or
$val -is [PSCustomObject] -or
$val -is [System.Collections.IDictionary])) {
[void]$complexCols.Add($name)
}
}
}

if ($complexCols.Count -eq 0) {
# No complex columns — emit the row as-is, no copy needed
$row
} else {
$val
$ht = [ordered]@{}
foreach ($name in $propNames) {
$val = $row.$name
$ht[$name] = if ($null -ne $val -and $complexCols.Contains($name)) {
ConvertTo-Json -InputObject $val -Compress -Depth 10
} else {
$val
}
}
[PSCustomObject]$ht
}
$i++
if ($i % 100 -eq 0) {
Write-Verbose "[$TableName] Appender: Row $i written."
}
}
[PSCustomObject]$ht
$i++
If ( $i % 100 -eq 0 ) {
Write-Verbose "[$TableName] Appender: Row $i written."
}
}

Expand Down
3 changes: 2 additions & 1 deletion SqlPipeline/SqlPipeline/SqlPipeline.psd1
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
RootModule = 'SqlPipeline.psm1'

# Die Versionsnummer dieses Moduls
ModuleVersion = '0.3.8'
ModuleVersion = '0.3.9'

# Unterstützte PSEditions
# CompatiblePSEditions = @()
Expand Down Expand Up @@ -126,6 +126,7 @@ PrivateData = @{

# 'ReleaseNotes' des Moduls
ReleaseNotes = '
0.3.9 Performance improvement for appending rows with complex datatypes (e.g. arrays, objects) by only checking the first row for complex columns and then directly serializing values without checking every time
0.3.8 Fixed the encryption for DuckDB connections
0.3.7 DuckDB: multi-row type inference & appender fixes with numeric and boolean types
0.3.6 Adding functionality to count updates and inserts when executing the MERGE
Expand Down