Skip to content

This small project demonstrates how to stream data from Pub/Sub into DuckDB using Ducklake.

Notifications You must be signed in to change notification settings

renzepost/duckdb-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuckDB Streaming

This small project demonstrates how to stream data from Pub/Sub into DuckDB using Ducklake.

Getting Started

From the root directory of the project:

docker compose up -d

To stop the containers:

docker compose down

The data is persisted in the data directory. After shutdown, the subscriber flushes the data to Parquet files. To read the data using DuckDB:

INSTALL ducklake;
ATTACH 'ducklake:data/duckdb_streaming.ducklake' AS db;
SELECT * FROM db.transactions;

Note that you can only query the data after the subscriber container has been stopped, because there is a concurrency lock on the DuckDB database while the subscriber is running.

Pub/Sub

This project uses the Google Cloud Pub/Sub emulator as a message broker. It has not been tested with the real Google Cloud Pub/Sub service.

Publisher

The publisher first creates a topic, and will continually receive messages from the Blockhain websocket (unconfirmed transactions) and publish them to the topic.

Subscriber

The subscriber first creates a subscription to the topic, and will continually receive messages from the topic and write them to the Ducklake table. Data inlining is enabled, in order to ensure that DuckDB doesn't output a single Parquet file per record. Once the process is stopped (it receives a SIGTERM signal), the subscriber flushes the data to Parquet files.

About

This small project demonstrates how to stream data from Pub/Sub into DuckDB using Ducklake.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published