Skip to content

Conversation

@prmoore77
Copy link
Contributor

Addresses #1107

@github-actions github-actions bot modified the milestone: ADBC Libraries 22 Dec 15, 2025
@prmoore77 prmoore77 changed the title Implement support for Bulk Ingest for ADBC Flight SQL driver feat: Implement support for Bulk Ingest for ADBC Flight SQL driver Dec 15, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently use docker-compose to manage this and I'd rather keep it consistent...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @lidavidm - I've changed the code to use docker-compose, per your feedback.

@prmoore77 prmoore77 requested a review from lidavidm December 15, 2025 21:42
@prmoore77
Copy link
Contributor Author

FWIW - I tested the locally built driver/wheel from Python against a remote (Azure) GizmoSQL server - and it seems to work pretty well:

import os
import time

import duckdb
from adbc_driver_flightsql import dbapi as gizmosql
from codetiming import Timer
from dotenv import load_dotenv

from config import get_logger

# Timer logging setup
TIMER_TEXT = "{name}: Elapsed time: {:.4f} seconds"


def main():
    load_dotenv()

    logger = get_logger()
    timer_logger = logger.info
    with Timer(name=f"Overall program",
               text=TIMER_TEXT,
               initial_text=True,
               logger=timer_logger
               ):
        with Timer(name=f"  Generate TPCH data and load into DuckDB (1GB)",
                   text=TIMER_TEXT,
                   initial_text=True,
                   logger=timer_logger
                   ):
            # Connect to DuckDB (memory only)
            duckdb_conn = duckdb.connect()
            duckdb_conn.install_extension("tpch")
            duckdb_conn.load_extension("tpch")
            duckdb_conn.execute(query="CALL dbgen(sf=1.0)")

        with Timer(name=f"  Get RecordBatch reader for the DuckDB lineitem table",
                   text=TIMER_TEXT,
                   initial_text=True,
                   logger=timer_logger
                   ):
            lineitem_arrow_reader = duckdb_conn.table("lineitem").fetch_arrow_reader(batch_size=10_000)

        with Timer(name=f"  Bulk ingest the data into GizmoSQL",
                   text=TIMER_TEXT,
                   initial_text=True,
                   logger=timer_logger
                   ):
            with gizmosql.connect(
                    uri="grpc+tls://try-gizmosql-adbc.gizmodata.com:31337",
                    db_kwargs={"username": os.environ["GIZMOSQL_USERNAME"],
                               "password": os.environ["GIZMOSQL_PASSWORD"]
                               },
                    autocommit=True
            ).cursor() as cursor:
                ingest_start = time.perf_counter()
                rows_loaded = cursor.adbc_ingest(
                    table_name="bulk_ingest_lineitem",
                    data=lineitem_arrow_reader,
                    mode="replace"
                )
                ingest_seconds = time.perf_counter() - ingest_start

                rows_per_sec = (rows_loaded / ingest_seconds) if ingest_seconds > 0 else float("inf")
                logger.info(msg=f"Loaded rows: {rows_loaded:,}")
                logger.info(msg=f"Ingest time: {ingest_seconds:.4f} s")
                logger.info(msg=f"Rows/sec: {rows_per_sec:,.2f}")


if __name__ == "__main__":
    main()

Result:

2025-12-16 13:39:36,290 - INFO     Timer Overall program started
2025-12-16 13:39:36,290 - INFO     Timer   Generate TPCH data and load into DuckDB (1GB) started
2025-12-16 13:39:38,723 - INFO       Generate TPCH data and load into DuckDB (1GB): Elapsed time: 2.4328 seconds
2025-12-16 13:39:38,723 - INFO     Timer   Get RecordBatch reader for the DuckDB lineitem table started
2025-12-16 13:39:38,726 - INFO       Get RecordBatch reader for the DuckDB lineitem table: Elapsed time: 0.0029 seconds
2025-12-16 13:39:38,726 - INFO     Timer   Bulk ingest the data into GizmoSQL started
2025-12-16 13:40:11,055 - INFO     Loaded rows: 6,001,215
2025-12-16 13:40:11,056 - INFO     Ingest time: 33.3162 s
2025-12-16 13:40:11,056 - INFO     Rows/sec: 180,129.18
2025-12-16 13:40:11,058 - INFO       Bulk ingest the data into GizmoSQL: Elapsed time: 33.9063 seconds
2025-12-16 13:40:11,058 - INFO     Overall program: Elapsed time: 36.3427 seconds
image image

@prmoore77
Copy link
Contributor Author

Well, the integration tests are failing in the pipeline. They "worked on my laptop" - but I'll investigate to see what is happening.

@prmoore77
Copy link
Contributor Author

Well, the integration tests are failing in the pipeline. They "worked on my laptop" - but I'll investigate to see what is happening.

I believe the integration tests related to Bulk Ingestion are working now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants