Optimize REPL read |> count() to use metadata for Parquet/ORC by aisrael · Pull Request #34 · aisrael/datu

aisrael · 2026-03-15T03:42:53Z

When the REPL pipeline is exactly read(path) |> count(), it is now optimized to a single count(path) stage. For Parquet and ORC this uses file metadata only (no row data is read); for Avro and CSV it still streams batches.

Key changes

optimize_read_then_count() in plan_pipeline_with_state: replaces [Read { path }, Count { path: None }] with [Count { path: Some(path) }].
read(path) |> select(...) |> count() is unchanged (still three stages); only the two-stage case is optimized.
Tests: test_plan_pipeline_count_no_auto_print updated for optimized pipeline; test_plan_pipeline_read_select_count_not_optimized added to ensure select-in-between is not optimized.

Made with Cursor

Made-with: Cursor

Optimize REPL read |> count() to use metadata for Parquet/ORC

2e1a053

Made-with: Cursor

aisrael force-pushed the fix/repl-count-use-metadata branch from 71d3b22 to 2e1a053 Compare March 15, 2026 03:43

cargo fmt

0d690d2

aisrael merged commit 55de46c into main Mar 15, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize REPL read |> count() to use metadata for Parquet/ORC#34

Optimize REPL read |> count() to use metadata for Parquet/ORC#34
aisrael merged 2 commits intomainfrom
fix/repl-count-use-metadata

aisrael commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aisrael commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant