flydelta

A Flight SQL proxy for Delta Lake. Query Delta tables via Apache Arrow Flight with efficient streaming and predicate pushdown.

flydelta is read-only on existing data and has no authentication logic to keep it very simple.

Why?

When a Delta Lake storage backend (on S3, disk, etc.) is queried by multiple client applications, having each client read Parquet files from the source storage directly is inefficient due to network traffic, even when using predicate pushdown.

flydelta solves this by acting as a query proxy deployed close to the data:

Installation

pip install flydelta

Usage

Server

Start a flydelta server with Delta tables:

flydelta serve -t users=s3://bucket/users -t orders=/data/orders

Options:

flydelta serve \
  --host 0.0.0.0 \
  --port 8815 \
  --table users=s3://bucket/users \
  --table orders=/data/orders \
  --pool-size 20 \
  --batch-size 100000

Docker

docker build -t flydelta .
docker run -p 8815:8815 flydelta -t users=/data/users -t orders=/data/orders

Python Client

from flydelta import Client

with Client("grpc://localhost:8815") as client:
    # Query to Arrow table
    table = client.query("SELECT * FROM users WHERE active = true")

    # Convert to pandas DataFrame
    df = table.to_pandas()

    # List available tables
    tables = client.list_tables()

Streaming Large Results

For memory-efficient processing of large result sets:

from flydelta import Client

with Client("grpc://localhost:8815") as client:
    for batch in client.stream_query("SELECT * FROM huge_table"):
        # Process each batch (default 100k rows)
        for row in batch.to_pylist():
            process(row)

        # Or process columnar (faster)
        ids = batch.column('id')
        values = batch.column('value')

CLI Client

# Query with table output
flydelta query "SELECT * FROM users LIMIT 10"

# Query with JSON output
flydelta query "SELECT * FROM users" -o json

# Query with CSV output
flydelta query "SELECT * FROM users" -o csv

# List tables
flydelta tables

Architecture

flydelta uses:

delta-rs: Rust-based Delta Lake implementation (no Spark needed)
DuckDB: Fast SQL execution with predicate pushdown
Apache Arrow Flight: Efficient gRPC-based data transfer

On startup, flydelta:

Loads Delta table metadata
Creates a connection pool with tables pre-registered
Caches schemas for fast query planning

Queries are executed via DuckDB and streamed back as Arrow record batches.

Development

This package uses poetry for packaging and dependencies management.

# Clone and install
git clone https://github.com/dataresearchcenter/flydelta.git
cd flydelta
poetry install --with dev

# Setup pre-commit hooks
poetry run pre-commit install

# Run tests
make test

# Run linting
make lint

Disclaimer

Despite the name suggesting otherwise, flydelta has no affiliation with Delta Air Lines. We cannot help you book flights, upgrade your SkyMiles status, or locate your lost luggage. Actually, please stop flying at all if possible. 🌱

License

flydelta is licensed under the AGPLv3 or later license. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
docs		docs
flydelta		flydelta
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
VERSION		VERSION
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

flydelta

Why?

Installation

Usage

Server

Docker

Python Client

Streaming Large Results

CLI Client

Architecture

Development

Disclaimer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

dataresearchcenter/flydelta

Folders and files

Latest commit

History

Repository files navigation

flydelta

Why?

Installation

Usage

Server

Docker

Python Client

Streaming Large Results

CLI Client

Architecture

Development

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages