Skip to content

[Python] Add experimental Arrow Flight support#122

Open
teodordelibasic-db wants to merge 13 commits intomainfrom
python-arrow
Open

[Python] Add experimental Arrow Flight support#122
teodordelibasic-db wants to merge 13 commits intomainfrom
python-arrow

Conversation

@teodordelibasic-db
Copy link
Contributor

What changes are proposed in this pull request?

Arrow Flight is disabled from Zerobus server side by default

  • Users can now ingest pyarrow.RecordBatch and pyarrow.Table via Arrow Flight protocol
  • Opt-in: pip install databricks-zerobus-ingest-sdk[arrow]
  • New classes: ZerobusArrowStream (sync + async), ArrowStreamConfigurationOptions
  • New SDK methods: create_arrow_stream(), recreate_arrow_stream()

Possible follow-ups (internal, no public API changes):

  1. Eliminate double IPC serialization — currently: pyarrow → IPC bytes → Rust RecordBatch → IPC again for Flight. Can skip the middle step and pass IPC bytes directly to FlightData.
  2. Zero-copy via PyCapsule — use Arrow C Data Interface (__arrow_c_array__()) to cross the Python-Rust boundary without any serialization.
  3. Store IPC bytes for recovery — avoid re-serializing when returning unacked batches.

How is this tested?

Added new unit tests.

@teodordelibasic-db teodordelibasic-db self-assigned this Mar 11, 2026
@teodordelibasic-db teodordelibasic-db linked an issue Mar 11, 2026 that may be closed by this pull request
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Comment on lines +86 to +87
# Re-export configuration from Rust core
ArrowStreamConfigurationOptions = _core.arrow.ArrowStreamConfigurationOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of just reexporting the name here, you should define the wrapper inside _zerobus_core.pyi, similar to how other classes are defined (StreamConfigurationOptions,RecordType...), with proper docs comments

Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
… types

Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
…ently dropping

Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
Signed-off-by: teodor-delibasic_data <teodor.delibasic@databricks.com>
pa = _check_pyarrow()
if isinstance(batch, pa.Table):
if batch.num_rows == 0:
raise ValueError("Cannot ingest an empty pyarrow.Table")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have this check in case of a record batch is sent as well right?


These tests verify the Python-side Arrow serialization/deserialization helpers,
the ArrowStreamConfigurationOptions pyclass, and the API surface of Arrow stream
classe, without making network connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Add experimental Arrow Flight support

3 participants