-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Feature description
Problem Description
The JDBC plugin currently requires strictly typed inputs and outputs, which leads to recurring usability issues and inconsistent behavior across databases and tasks.
1. Input Side (e.g., Batch Tasks)
For Batch tasks we internally rely on prepared statement, meaning the plugin prepares a SQL statement on the database and expects parameters with explicit types.
Example:
ps.setTimestamp(index, Timestamp.valueOf(ldt));While the internal timestamp is constructed from a Unix epoch, the plugin only accepts a strongly typed Timestamp from input files.
This means:
- Users cannot provide a Unix epoch value (e.g. when having data in a .csv or .json file)
- Users cannot provide an ISO-8601 timestamp string
- Similar issues occur with other types (e.g., Postgres JSONB, arrays, etc.)
This makes common workflows unnecessarily brittle and hard to debug.
2. Output Side (Fetch Task)
When fetching data:
- We try to map DB-specific column types to Java types
- Then serialize these into Ion files with Ion type annotations (not using native Ion types directly)
This approach causes multiple issues:
- Users cannot easily see whether type annotations were applied unless they download and inspect the Ion file
- Several type conversions are missing or incomplete (Example: DuckDB STRUCT output currently fails)
Why This Is a Problem
Different JDBC drivers behave differently, and the plugin now applies per-database patches, which is not scalable. Users frequently hit surprising type-mismatch errors, both when writing and reading data.
Possible improvements:
- Allow safe type casting for common inputs (ISO timestamp strings, numeric epochs, JSON strings…)
- Provide a clear mapping table: JDBC type → Java type → Ion type for each database plugin
- Add integration tests across major databases