-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi, I was wondering if we could add support for Ducklake, a new open data lake format from the team behind DuckDB.
I believe it's a good candidate for reladiff because it has support for both multiple schemas and time travel.
It differs from a regular DuckDB file because it's made of two parts: a SQL catalog (sqlite/duckdb/mysql/postgres) and an object storage (S3/GCS/R2/B2) where pieces of tables are stored as Parquet files.
Currently it can be used by passing SQL/object storage connection parameters to a DuckDB client, and attaching the SQL catalog prepended by "ducklake:". This creates a virtual database than can be queried with the regular DuckDB dialect, plus some extra features like time travel.
I think DuckDB can do the heavy lifting of interpreting the catalog, if we provide it with the needed parameters.
The parameters needed can be as few as the location of the sqlite/duckdb catalog file and folder path, to the full configurations of both SQL server and object storage:
Local DuckDB catalog + folder storage:
INSTALL ducklake;
ATTACH 'ducklake:/tmp/datalake/metadata.duckdb' AS datalake
(DATA_PATH '/tmp/datalake/storage/');
USE datalake;Postgres catalog + S3 storage:
INSTALL ducklake;
CREATE SECRET (
TYPE postgres,
HOST 'HHH',
PORT NNN,
DATABASE 'DDD',
USER 'UUU',
PASSWORD 'PPP'
);
CREATE SECRET (
TYPE s3,
ENDPOINT 'HHH:NNN',
URL_STYLE 'UUU',
REGION 'RRR',
KEY_ID 'YYY',
SECRET 'XXX'
);
ATTACH 'ducklake:postgres:' AS datalake
(DATA_PATH 's3://BUCKETNAME');
USE datalake;After that, the datalake can be queried:
-- specific schema
SELECT *
FROM datalake.schema1.tablename;
-- time travel
SELECT *
FROM datalake.schema2.tablename
AT (TIMESTAMP => now() - INTERVAL '1 hour');If it's really needed to specify everything in URL form, unfortunately it would become very long:
ducklake://<CATALOG_SPEC>@<STORE_SPEC>
ducklake://<cat_type>:<host_or_file>:<port>:<db>:<user>:<pass>@<store_type>:<endpoint_or_folder>:<port>:<bucket>:<region>:<style>:<key_id>:<secret>