feature: added support as source for ducklake by Femi3211 · Pull Request #295 · rosettadb/rosetta

Femi3211 · 2026-01-30T14:18:25Z

Summary by CodeRabbit

New Features
- Added DuckLake database source support with configurable connection parameters (database path, data path, metadata DB).
- Implemented environment-variable substitution for DuckLake configuration settings.
- Added schema extraction and validation for DuckLake (table, view, column discovery) with safe fallbacks.
- Integrated dynamic module loading for extensible extractors and an optimized selection path for DuckLake sources.
Improvements
- Enhanced debugging and robustness when connecting to DuckLake.

coderabbitai · 2026-01-30T14:18:39Z

📝 Walkthrough

Walkthrough

Adds DuckLake support: three new Connection fields, config substitution for those fields, a new DuckLakeGenerator to extract metadata via DuckDB+DuckLake, and factory routing to select the DuckLake generator for connections with dbType "ducklake".

Changes

Cohort / File(s)	Summary
Connection model `common/src/main/java/com/adaptivescale/rosetta/common/models/input/Connection.java`	Added three DuckLake-specific fields: `duckdbDatabasePath`, `ducklakeDataPath`, `ducklakeMetadataDb` with getters/setters.
Config substitution `cli/src/main/java/com/adaptivescale/rosetta/cli/ConfigYmlConverter.java`	Apply environment/config variable substitution to the three new DuckLake fields during config processing.
DuckLake generator `source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java`	New `DuckLakeGenerator` class: validates DuckLake config, builds DuckDB JDBC URL, installs/attaches DuckLake catalog, registers/parses tables/views/columns (with dynamic extractor loading and fallbacks), and assembles `Database` results. Includes helpers for catalog handling and debug SQL.
Factory routing `source/src/main/java/com/adataptivescale/rosetta/source/core/SourceGeneratorFactory.java`	Detect `dbType == "ducklake"` (case-insensitive) and return a `DuckLakeGenerator` early, bypassing standard extractor initialization.

Sequence Diagram

sequenceDiagram
    participant Config as Configuration
    participant Factory as SourceGeneratorFactory
    participant DLGen as DuckLakeGenerator
    participant JDBC as DuckDB JDBC
    participant DL as DuckLake Catalog
    participant Extractors as Table/View/Column Extractors
    participant DB as Database Object

    Config->>Factory: sourceGenerator(connection, driverProvider)
    Factory->>Factory: if dbType == "ducklake"
    Factory->>DLGen: instantiate DuckLakeGenerator
    Factory->>DLGen: generate(connection)

    DLGen->>DLGen: validate ducklakeDataPath & metadata path
    DLGen->>JDBC: open DuckDB JDBC connection
    DLGen->>DL: install/load DuckLake
    DLGen->>DL: attach catalog (DATA_PATH, METADATA_PATH)
    DLGen->>DL: use/select catalog/schema

    DLGen->>Extractors: load table/view/column extractors (with fallbacks)
    DLGen->>JDBC: extract tables (filter DuckLake metadata)
    DLGen->>JDBC: extract views
    DLGen->>Extractors: extract columns

    DLGen->>DB: assemble Database(name, tables, views, type)
    DLGen->>Factory: return Database

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through configs, a floppy-eared cheer,
DuckLake paths tucked where the parquet appear,
Catalogs attached, little tables in line,
Extractors wake up — metadata's fine!
A rabbit applauds this new data frontier.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding DuckLake as a data source, which is reflected across all modified files (Connection model, ConfigYmlConverter, DuckLakeGenerator, and SourceGeneratorFactory).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/ducklake-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In
`@source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java`:
- Around line 243-248: The SQL string in DuckLakeGenerator directly interpolates
parquetFile.getAbsolutePath() into createTableSql which allows file path
injection via single quotes; sanitize the path before interpolation by escaping
single quotes (e.g., replace "'" with "''") and then use the escaped value when
building createTableSql (references: DuckLakeGenerator, variables parquetFile,
tableName, createStmt, createTableSql) so the generated CREATE TABLE ...
read_parquet('...') literal cannot break SQL syntax.
- Around line 300-344: The executeDebugSQL method currently logs full SQL and
result rows via log.info; change it to avoid leaking sensitive data by (1)
updating the method signature of executeDebugSQL (and any callers) to accept a
boolean flag like enableResultLogging (or use an existing debug flag on
Connection), (2) change the SQL statement log to a non-default level (TRACE or
DEBUG) using log.trace(...) and only emit it when tracing is enabled, and (3)
wrap the result-row logging block (the loop that builds rows and the log.info
calls) in a conditional that checks enableResultLogging before logging row
contents; keep setupDuckLake/connect logic but ensure no sensitive output is
logged when the flag is false (optionally move this utility to a test/debug-only
module if desired).
- Around line 190-208: The ATTACH/USE SQL builds raw SQL from user inputs
(catalogName, rosettaConnection.getDucklakeDataPath()) in DuckLakeGenerator
which allows SQL injection; validate and sanitize these inputs before use:
enforce a strict regex (e.g. only [A-Za-z0-9_]+) for catalogName and reject or
normalize any value that doesn't match, and validate ducklakeDataPath to ensure
it is a safe filesystem path (no semicolons, quotes, or SQL metacharacters) or
escape/quote it correctly; then update the attachSql construction and the
useStmt.execute call (the String.format building attachSql and the "USE " +
catalogName call) to use the validated/escaped values only, failing fast with a
clear exception if validation fails.
- Around line 353-356: The SQL built in importCsvToDuckLake interpolates
tableName and csvFilePath directly causing SQL injection; update
importCsvToDuckLake to validate and safely quote/escape inputs: validate
tableName against a strict identifier regex (e.g. [A-Za-z_][A-Za-z0-9_]* ) and
reject or sanitize otherwise, quote the identifier properly (escape any internal
quotes) rather than raw string concat, and treat csvFilePath as a data parameter
or at minimum escape/quote it (escape single quotes) before inserting into the
SQL string; ensure similar protection for catalogName if it can be controlled
externally and use prepared statements or the DB driver's identifier-quoting
helper where available.

🧹 Nitpick comments (4)

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java (4)
60-65: Unchecked casts could mask type mismatches.

The casts on lines 60 and 65 assume that tableExtractor.extract() and viewExtractor.extract() return Collection<Table> and Collection<View> respectively. While this aligns with the expected interface contracts, consider using generics on the TableExtractor and ViewExtractor interfaces to make this type-safe, or add explicit type checking.

104-123: Consider pattern-based filtering for metadata tables.

The hardcoded list of DuckLake metadata tables may become stale as DuckLake evolves. Consider using a prefix-based filter (e.g., tables starting with ducklake_) which would be more maintainable.
♻️ Alternative: Prefix-based filtering
     private Collection<Table> filterDuckLakeMetadataTables(Collection<Table> allTables) {
-        Set<String> metadataTableNames = Set.of(
-            "ducklake_column", "ducklake_column_tag", ...
-        );
-
         Collection<Table> userTables = new ArrayList<>();
         for (Table table : allTables) {
-            if (!metadataTableNames.contains(table.getName())) {
+            if (!table.getName().startsWith("ducklake_")) {
                 userTables.add(table);
             }
         }
         return userTables;
     }
43-77: Consider try-with-resources for cleaner resource management.

The current try-finally pattern works but could be simplified using try-with-resources for java.sql.Connection.
♻️ Refactor to try-with-resources
         Driver driver = driverProvider.getDriver(tempConnection);
         Properties properties = JDBCUtils.setJDBCAuth(tempConnection);
-        java.sql.Connection connect = driver.connect(duckdbUrl, properties);
-
-        try {
+        try (java.sql.Connection connect = driver.connect(duckdbUrl, properties)) {
             String attachedCatalogAlias = setupDuckLake(connect, connection);
             // ... rest of method ...
             return database;
-        } finally {
-            connect.close();
         }
80-102: Code duplication with generate() method.

The validate() method duplicates the connection setup and DuckLake initialization logic from generate(). Consider extracting the common setup into a private helper method to improve maintainability.
♻️ Extract common setup logic
private java.sql.Connection createDuckLakeConnection(Connection connection) throws Exception {
    if (connection.getDucklakeDataPath() == null || connection.getDucklakeDataPath().trim().isEmpty()) {
        throw new IllegalArgumentException("ducklakeDataPath is required for DuckLake connections");
    }
    
    String duckdbUrl = buildDuckDbUrl(connection);
    Connection tempConnection = new Connection();
    tempConnection.setUrl(duckdbUrl);
    tempConnection.setDbType("duckdb");
    Driver driver = driverProvider.getDriver(tempConnection);
    Properties properties = JDBCUtils.setJDBCAuth(tempConnection);
    return driver.connect(duckdbUrl, properties);
}
Then use in both generate() and validate().

coderabbitai · 2026-01-30T14:22:25Z

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java

+            String attachSql = String.format(
+                "ATTACH 'ducklake:%s' AS %s (DATA_PATH '%s');",
+                metadataDb, catalogName, rosettaConnection.getDucklakeDataPath()
+            );
+
+            try {
+                log.info("Attaching DuckLake catalog: {}", attachSql);
+                stmt.execute(attachSql);
+            } catch (SQLException e) {
+                if (e.getMessage() != null && e.getMessage().contains("already exists")) {
+                    log.info("Catalog '{}' is already attached", catalogName);
+                } else {
+                    throw e;
+                }
+            }
+
+            try (Statement useStmt = connection.createStatement()) {
+                useStmt.execute("USE " + catalogName + ";");
+            }


⚠️ Potential issue | 🔴 Critical

SQL injection vulnerability in DuckLake setup.

The catalogName and ducklakeDataPath values are interpolated directly into SQL statements via String.format() without sanitization. These values originate from user configuration and could contain malicious SQL.

For example, a malicious databaseName like foo; DROP TABLE users; -- would be directly embedded in the ATTACH and USE statements.

Consider validating that catalogName contains only alphanumeric characters and underscores, or use identifier quoting appropriate for DuckDB.

🛡️ Proposed fix: Add input validation for catalog name

+ private static final java.util.regex.Pattern VALID_IDENTIFIER = + java.util.regex.Pattern.compile("^[a-zA-Z_][a-zA-Z0-9_]*$"); + private String setupDuckLake(java.sql.Connection connection, Connection rosettaConnection) throws SQLException { Statement stmt = connection.createStatement(); try { // ... existing code ... String catalogName = rosettaConnection.getDatabaseName(); if (catalogName == null || catalogName.trim().isEmpty()) { throw new IllegalArgumentException("databaseName is required for DuckLake connections"); } + if (!VALID_IDENTIFIER.matcher(catalogName).matches()) { + throw new IllegalArgumentException("databaseName must be a valid SQL identifier (alphanumeric and underscores only)"); + }

🤖 Prompt for AI Agents

In `@source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java` around lines 190 - 208, The ATTACH/USE SQL builds raw SQL from user inputs (catalogName, rosettaConnection.getDucklakeDataPath()) in DuckLakeGenerator which allows SQL injection; validate and sanitize these inputs before use: enforce a strict regex (e.g. only [A-Za-z0-9_]+) for catalogName and reject or normalize any value that doesn't match, and validate ducklakeDataPath to ensure it is a safe filesystem path (no semicolons, quotes, or SQL metacharacters) or escape/quote it correctly; then update the attachSql construction and the useStmt.execute call (the String.format building attachSql and the "USE " + catalogName call) to use the validated/escaped values only, failing fast with a clear exception if validation fails.

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java

coderabbitai · 2026-01-30T14:22:25Z

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java

+    // Helper method to execute SQL commands
+    public static void executeDebugSQL(Connection connection, String sql) throws Exception {
+        if (connection.getDucklakeDataPath() == null || connection.getDucklakeDataPath().trim().isEmpty()) {
+            throw new IllegalArgumentException("ducklakeDataPath is required for DuckLake connections");
+        }
+
+        String duckdbUrl = connection.getDuckdbDatabasePath() != null && !connection.getDuckdbDatabasePath().trim().isEmpty()
+            ? "jdbc:duckdb:" + connection.getDuckdbDatabasePath()
+            : "jdbc:duckdb:";
+
+        Connection tempConnection = new Connection();
+        tempConnection.setUrl(duckdbUrl);
+        tempConnection.setDbType("duckdb");
+        Driver driver = new DriverManagerDriverProvider().getDriver(tempConnection);
+        Properties properties = JDBCUtils.setJDBCAuth(tempConnection);
+        java.sql.Connection connect = driver.connect(duckdbUrl, properties);
+
+        try {
+            DuckLakeGenerator generator = new DuckLakeGenerator(new DriverManagerDriverProvider());
+            generator.setupDuckLake(connect, connection);
+
+            try (Statement stmt = connect.createStatement()) {
+                log.info("Executing SQL: {}", sql);
+                boolean hasResults = stmt.execute(sql);
+                if (hasResults) {
+                    try (ResultSet rs = stmt.getResultSet()) {
+                        log.info("Query returned results:");
+                        int colCount = rs.getMetaData().getColumnCount();
+                        while (rs.next()) {
+                            StringBuilder row = new StringBuilder("  ");
+                            for (int i = 1; i <= colCount; i++) {
+                                if (i > 1) row.append(" | ");
+                                row.append(rs.getString(i));
+                            }
+                            log.info(row.toString());
+                        }
+                    }
+                } else {
+                    log.info("SQL executed successfully. Rows affected: {}", stmt.getUpdateCount());
+                }
+            }
+        } finally {
+            connect.close();
+        }
+    }


⚠️ Potential issue | 🟠 Major

Debug utility logs potentially sensitive data.

The executeDebugSQL method logs:

The full SQL query (line 322)

Complete query results row by row (line 334)

In production, this could inadvertently expose sensitive data (credentials, PII) to logs. Consider:

Restricting log level to TRACE/DEBUG

Adding a flag to disable result logging

Moving these utilities to a separate debug/test module if not needed in production

🛡️ Proposed fix: Use TRACE level for sensitive output

try (Statement stmt = connect.createStatement()) { - log.info("Executing SQL: {}", sql); + log.debug("Executing SQL: {}", sql); boolean hasResults = stmt.execute(sql); if (hasResults) { try (ResultSet rs = stmt.getResultSet()) { - log.info("Query returned results:"); + log.trace("Query returned results:"); int colCount = rs.getMetaData().getColumnCount(); while (rs.next()) { StringBuilder row = new StringBuilder(" "); for (int i = 1; i <= colCount; i++) { if (i > 1) row.append(" | "); row.append(rs.getString(i)); } - log.info(row.toString()); + log.trace(row.toString()); } } } else { - log.info("SQL executed successfully. Rows affected: {}", stmt.getUpdateCount()); + log.debug("SQL executed successfully. Rows affected: {}", stmt.getUpdateCount()); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Helper method to execute SQL commands

public static void executeDebugSQL(Connection connection, String sql) throws Exception {

if (connection.getDucklakeDataPath() == null || connection.getDucklakeDataPath().trim().isEmpty()) {

throw new IllegalArgumentException("ducklakeDataPath is required for DuckLake connections");

}

String duckdbUrl = connection.getDuckdbDatabasePath() != null && !connection.getDuckdbDatabasePath().trim().isEmpty()

? "jdbc:duckdb:" + connection.getDuckdbDatabasePath()

: "jdbc:duckdb:";

Connection tempConnection = new Connection();

tempConnection.setUrl(duckdbUrl);

tempConnection.setDbType("duckdb");

Driver driver = new DriverManagerDriverProvider().getDriver(tempConnection);

Properties properties = JDBCUtils.setJDBCAuth(tempConnection);

java.sql.Connection connect = driver.connect(duckdbUrl, properties);

try {

DuckLakeGenerator generator = new DuckLakeGenerator(new DriverManagerDriverProvider());

generator.setupDuckLake(connect, connection);

try (Statement stmt = connect.createStatement()) {

log.info("Executing SQL: {}", sql);

boolean hasResults = stmt.execute(sql);

if (hasResults) {

try (ResultSet rs = stmt.getResultSet()) {

log.info("Query returned results:");

int colCount = rs.getMetaData().getColumnCount();

while (rs.next()) {

StringBuilder row = new StringBuilder(" ");

for (int i = 1; i <= colCount; i++) {

if (i > 1) row.append(" | ");

row.append(rs.getString(i));

}

log.info(row.toString());

}

}

} else {

log.info("SQL executed successfully. Rows affected: {}", stmt.getUpdateCount());

}

}

} finally {

connect.close();

}

}

// Helper method to execute SQL commands

public static void executeDebugSQL(Connection connection, String sql) throws Exception {

if (connection.getDucklakeDataPath() == null || connection.getDucklakeDataPath().trim().isEmpty()) {

throw new IllegalArgumentException("ducklakeDataPath is required for DuckLake connections");

}

String duckdbUrl = connection.getDuckdbDatabasePath() != null && !connection.getDuckdbDatabasePath().trim().isEmpty()

? "jdbc:duckdb:" + connection.getDuckdbDatabasePath()

: "jdbc:duckdb:";

Connection tempConnection = new Connection();

tempConnection.setUrl(duckdbUrl);

tempConnection.setDbType("duckdb");

Driver driver = new DriverManagerDriverProvider().getDriver(tempConnection);

Properties properties = JDBCUtils.setJDBCAuth(tempConnection);

java.sql.Connection connect = driver.connect(duckdbUrl, properties);

try {

DuckLakeGenerator generator = new DuckLakeGenerator(new DriverManagerDriverProvider());

generator.setupDuckLake(connect, connection);

try (Statement stmt = connect.createStatement()) {

log.debug("Executing SQL: {}", sql);

boolean hasResults = stmt.execute(sql);

if (hasResults) {

try (ResultSet rs = stmt.getResultSet()) {

log.trace("Query returned results:");

int colCount = rs.getMetaData().getColumnCount();

while (rs.next()) {

StringBuilder row = new StringBuilder(" ");

for (int i = 1; i <= colCount; i++) {

if (i > 1) row.append(" | ");

row.append(rs.getString(i));

}

log.trace(row.toString());

}

}

} else {

log.debug("SQL executed successfully. Rows affected: {}", stmt.getUpdateCount());

}

}

} finally {

connect.close();

}

}

🤖 Prompt for AI Agents

In `@source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java` around lines 300 - 344, The executeDebugSQL method currently logs full SQL and result rows via log.info; change it to avoid leaking sensitive data by (1) updating the method signature of executeDebugSQL (and any callers) to accept a boolean flag like enableResultLogging (or use an existing debug flag on Connection), (2) change the SQL statement log to a non-default level (TRACE or DEBUG) using log.trace(...) and only emit it when tracing is enabled, and (3) wrap the result-row logging block (the loop that builds rows and the log.info calls) in a conditional that checks enableResultLogging before logging row contents; keep setupDuckLake/connect logic but ensure no sensitive output is logged when the flag is false (optionally move this utility to a test/debug-only module if desired).

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In
`@source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java`:
- Around line 330-361: The public static helper executeDebugSQL in
DuckLakeGenerator exposes an arbitrary-SQL execution surface; remove or relocate
it out of production code by either (1) moving the method into a test utility
class (e.g., create a DuckLakeTestUtils and place executeDebugSQL there for
test-only use), or (2) eliminate it entirely if not required, or (3) at minimum
reduce its visibility (make it private/package-private) and annotate with
`@VisibleForTesting` so it cannot be called from production code; update any
callers to reference the new test utility or remove them accordingly and ensure
the DuckLakeGenerator class no longer exposes executeDebugSQL publicly.
- Around line 115-126: openDuckDbConnection creates a new Connection (temp) but
never copies authentication from the original, so JDBCUtils.setJDBCAuth(temp)
yields empty creds; fix by propagating credentials from the original Connection
into temp (or bypass temp and build Properties from original) before calling
JDBCUtils.setJDBCAuth. Update openDuckDbConnection to copy userName and password
(and any other auth fields) from the original parameter to temp prior to
driverProvider.getDriver / JDBCUtils.setJDBCAuth so the resulting Properties
include the source credentials.
- Around line 248-265: listTablesFallback creates Table instances with only the
name set, causing downstream column extraction to fail because table.getSchema()
and table.getType() are null; update listTablesFallback (the method) to populate
the Table.schema and Table.type fields as you build each Table: set schema from
the query parameter or from the result set column (e.g., result column
"table_schema") and set type to the table_type value (or hardcode "BASE
TABLE"/use rs.getString("table_type")), so that ColumnsExtractor.extract() can
call getMetaData().getColumns(databaseName, table.getSchema(), ...) correctly.

🧹 Nitpick comments (3)

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java (3)

288-328: Three near-identical extractor loader methods — consider extracting a generic helper.

loadDuckDbTableExtractor, loadDuckDbViewExtractor, and loadDuckDbColumnExtractor share the same structure: call ModuleLoader, check isEmpty, reflectively instantiate, catch the same exceptions. A single generic helper (parameterized on the extractor type) would eliminate the duplication.

267-286: Hardcoded metadata table list is fragile — consider prefix-based filtering.

If DuckLake adds new internal tables in future versions, this static set will miss them. A prefix-based filter (name.startsWith("ducklake_")) would be more resilient, assuming user tables don't use that prefix.

182-184: Silently catching all exceptions from INSTALL ducklake could mask real failures.

While the subsequent LOAD ducklake would likely fail if installation truly failed, consider at least logging the caught exception at DEBUG level for troubleshooting.

coderabbitai · 2026-02-11T22:25:26Z

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java

+    private java.sql.Connection openDuckDbConnection(String duckdbUrl, Connection original) throws SQLException {
+        Connection temp = new Connection();
+        temp.setUrl(duckdbUrl);
+        temp.setDbType("duckdb");
+        Driver driver = driverProvider.getDriver(temp);
+
+        // Use auth if you support it; for local duckdb it’s usually empty
+        Properties props = JDBCUtils.setJDBCAuth(temp);
+
+        log.info("Opening DuckDB JDBC session: {}", duckdbUrl);
+        return driver.connect(duckdbUrl, props);
+    }


⚠️ Potential issue | 🟠 Major

Credentials from the original connection are never forwarded to the DuckDB session.

openDuckDbConnection creates a temporary Connection (line 116) but never copies userName/password from original. JDBCUtils.setJDBCAuth(temp) on line 122 will produce empty properties, so any authentication configured on the source connection is silently dropped.

Proposed fix

private java.sql.Connection openDuckDbConnection(String duckdbUrl, Connection original) throws SQLException { Connection temp = new Connection(); temp.setUrl(duckdbUrl); temp.setDbType("duckdb"); + temp.setUserName(original.getUserName()); + temp.setPassword(original.getPassword()); Driver driver = driverProvider.getDriver(temp); - // Use auth if you support it; for local duckdb it's usually empty Properties props = JDBCUtils.setJDBCAuth(temp);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

private java.sql.Connection openDuckDbConnection(String duckdbUrl, Connection original) throws SQLException {

Connection temp = new Connection();

temp.setUrl(duckdbUrl);

temp.setDbType("duckdb");

Driver driver = driverProvider.getDriver(temp);

// Use auth if you support it; for local duckdb it’s usually empty

Properties props = JDBCUtils.setJDBCAuth(temp);

log.info("Opening DuckDB JDBC session: {}", duckdbUrl);

return driver.connect(duckdbUrl, props);

}

private java.sql.Connection openDuckDbConnection(String duckdbUrl, Connection original) throws SQLException {

Connection temp = new Connection();

temp.setUrl(duckdbUrl);

temp.setDbType("duckdb");

temp.setUserName(original.getUserName());

temp.setPassword(original.getPassword());

Driver driver = driverProvider.getDriver(temp);

Properties props = JDBCUtils.setJDBCAuth(temp);

log.info("Opening DuckDB JDBC session: {}", duckdbUrl);

return driver.connect(duckdbUrl, props);

}

🤖 Prompt for AI Agents

In `@source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java` around lines 115 - 126, openDuckDbConnection creates a new Connection (temp) but never copies authentication from the original, so JDBCUtils.setJDBCAuth(temp) yields empty creds; fix by propagating credentials from the original Connection into temp (or bypass temp and build Properties from original) before calling JDBCUtils.setJDBCAuth. Update openDuckDbConnection to copy userName and password (and any other auth fields) from the original parameter to temp prior to driverProvider.getDriver / JDBCUtils.setJDBCAuth so the resulting Properties include the source credentials.

coderabbitai · 2026-02-11T22:25:26Z

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java

+    private Collection<Table> listTablesFallback(java.sql.Connection jdbc, String catalog, String schema) throws SQLException {
+        String sql = "SELECT table_name FROM information_schema.tables " +
+                "WHERE table_catalog = ? AND table_schema = ? AND table_type='BASE TABLE' " +
+                "ORDER BY table_name";
+        List<Table> out = new ArrayList<>();
+        try (PreparedStatement ps = jdbc.prepareStatement(sql)) {
+            ps.setString(1, catalog);
+            ps.setString(2, schema);
+            try (ResultSet rs = ps.executeQuery()) {
+                while (rs.next()) {
+                    Table t = new Table();
+                    t.setName(rs.getString("table_name"));
+                    out.add(t);
+                }
+            }
+        }
+        return out;
+    }


⚠️ Potential issue | 🟠 Major

Fallback tables are missing schema and type, which will break column extraction.

listTablesFallback creates Table objects with only name set. Downstream, ColumnsExtractor.extract() calls getMetaData().getColumns(databaseName, table.getSchema(), ...) — a null schema may cause column lookup to fail or match unintended tables in attached DuckLake catalogs.

Proposed fix

while (rs.next()) { Table t = new Table(); t.setName(rs.getString("table_name")); + t.setSchema(schema); + t.setType("TABLE"); out.add(t); }

🤖 Prompt for AI Agents

In `@source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java` around lines 248 - 265, listTablesFallback creates Table instances with only the name set, causing downstream column extraction to fail because table.getSchema() and table.getType() are null; update listTablesFallback (the method) to populate the Table.schema and Table.type fields as you build each Table: set schema from the query parameter or from the result set column (e.g., result column "table_schema") and set type to the table_type value (or hardcode "BASE TABLE"/use rs.getString("table_type")), so that ColumnsExtractor.extract() can call getMetaData().getColumns(databaseName, table.getSchema(), ...) correctly.

coderabbitai · 2026-02-11T22:25:26Z

source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java

+    // Helper method to execute SQL commands (fixed URL building)
+    public static void executeDebugSQL(Connection connection, String sql) throws Exception {
+        DuckLakeGenerator generator = new DuckLakeGenerator(new DriverManagerDriverProvider());
+        String duckdbUrl = generator.buildDuckDbUrl(connection);
+
+        java.sql.Connection jdbc = generator.openDuckDbConnection(duckdbUrl, connection);
+        try {
+            generator.setupDuckLake(jdbc, connection);
+
+            try (Statement stmt = jdbc.createStatement()) {
+                log.info("Executing SQL: {}", sql);
+                boolean hasResults = stmt.execute(sql);
+                if (hasResults) {
+                    try (ResultSet rs = stmt.getResultSet()) {
+                        int colCount = rs.getMetaData().getColumnCount();
+                        while (rs.next()) {
+                            StringBuilder row = new StringBuilder("  ");
+                            for (int i = 1; i <= colCount; i++) {
+                                if (i > 1) row.append(" | ");
+                                row.append(rs.getString(i));
+                            }
+                            log.info(row.toString());
+                        }
+                    }
+                } else {
+                    log.info("SQL executed. Rows affected: {}", stmt.getUpdateCount());
+                }
+            }
+        } finally {
+            try { jdbc.close(); } catch (SQLException ignored) {}
+        }
+    }


⚠️ Potential issue | 🟠 Major

Debug SQL execution utility should not ship in production code.

executeDebugSQL is a public static method that accepts and executes arbitrary SQL against a fully configured DuckLake connection. Beyond the logging concern (flagged previously), exposing this in production code creates an unnecessary attack surface. Consider moving this to a test utility class or removing it entirely.

🤖 Prompt for AI Agents

In `@source/src/main/java/com/adataptivescale/rosetta/source/core/DuckLakeGenerator.java` around lines 330 - 361, The public static helper executeDebugSQL in DuckLakeGenerator exposes an arbitrary-SQL execution surface; remove or relocate it out of production code by either (1) moving the method into a test utility class (e.g., create a DuckLakeTestUtils and place executeDebugSQL there for test-only use), or (2) eliminate it entirely if not required, or (3) at minimum reduce its visibility (make it private/package-private) and annotate with `@VisibleForTesting` so it cannot be called from production code; update any callers to reference the new test utility or remove them accordingly and ensure the DuckLakeGenerator class no longer exposes executeDebugSQL publicly.

feature: added support as source for ducklake

4251a61

Femi3211 requested a review from nbesimi January 30, 2026 14:18

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

fixed: ducklake source

4c0eb20

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: added support as source for ducklake#295

feature: added support as source for ducklake#295
Femi3211 wants to merge 2 commits intodevelopfrom
feature/ducklake-integration

Femi3211 commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 30, 2026

Uh oh!

Uh oh!

coderabbitai bot Jan 30, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 11, 2026

Uh oh!

coderabbitai bot Feb 11, 2026

Uh oh!

coderabbitai bot Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Femi3211 commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Femi3211 commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading