Skip to content

[codex] Add source materialization cache and nanoarrow backend#10

Merged
ashione merged 3 commits intomainfrom
codex/arrow-csv-materialization
Mar 31, 2026
Merged

[codex] Add source materialization cache and nanoarrow backend#10
ashione merged 3 commits intomainfrom
codex/arrow-csv-materialization

Conversation

@ashione
Copy link
Copy Markdown
Owner

@ashione ashione commented Mar 30, 2026

What changed

  • add a generic source materialization cache layer keyed by source format, options, absolute path, file size, and mtime
  • teach DataflowSession::read_csv(...) to reuse persisted materialized data instead of reparsing unchanged sources
  • add a nanoarrow_ipc materialization backend alongside the internal binary row batch backend
  • add native C++ regression coverage for source materialization roundtrip and cache reuse
  • add a Python regression test that executes SQL after reading a 20k-row CSV, then verifies a second session reuses the existing materialized source file

Why

Repeated reads of the same formatted source were always going back to the original file. This branch adds a reusable, cross-session materialization path so follow-up queries can skip the original source parse when the source fingerprint is unchanged.

Impact

  • batch read_csv(...) can now reuse persisted source materialization across sessions
  • the materialization layer is no longer CSV-specific, which leaves room for other formatted sources to use the same mechanism
  • nanoarrow_ipc is available as a persisted data format without requiring the full Arrow C++ dependency tree

Validation

  • bazel test //:source_materialization_test
  • bazel build //:velaria_pyext
  • bazel test //python_api:source_materialization_test

@ashione ashione marked this pull request as ready for review March 31, 2026 00:25
@ashione ashione merged commit 5fa99f7 into main Mar 31, 2026
3 checks passed
@ashione ashione deleted the codex/arrow-csv-materialization branch March 31, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant