Skip to content

refactor: unify head with tail/sample, add CSV limit, unify column selection#35

Merged
aisrael merged 2 commits intomainfrom
refactor/head-use-shared-reader-display
Mar 15, 2026
Merged

refactor: unify head with tail/sample, add CSV limit, unify column selection#35
aisrael merged 2 commits intomainfrom
refactor/head-use-shared-reader-display

Conversation

@aisrael
Copy link
Owner

@aisrael aisrael commented Mar 15, 2026

Aligns head with tail/sample, adds row limit support for CSV, and unifies column selection so CLI and REPL share the same resolution and projection logic.

Commits

  1. refactor(head): use build_reader and apply_select_and_display like tail/sample

    • head uses build_reader (with limit = Some(args.number)) instead of read_to_batches.
    • Optional column selection and display via parse_select_step and apply_select_and_display.
    • Explicit format check: only Parquet, Avro, CSV, and ORC supported (same as tail/sample).
  2. Enhance CSV and ORC reading capabilities by adding row limit support

    • ReadCsvStep gains an optional limit; build_reader passes limit for CSV so head can limit rows without loading the full file.
    • head sets offset 0 when limiting ORC for correct row selection.
    • select_columns_to_batches (REPL) now uses the same resolution and Arrow project as the streaming SelectColumnsStep (no DataFusion in this path); REPL exec_select calls it synchronously.

Summary

Area Change
head Same reader → select → display pipeline as tail/sample
CSV build_reader + ReadCsvStep support limit for streaming-style head
Column selection One resolution path (resolve_column_specs) and one projection method (Arrow project) for both CLI and REPL
REPL select Sync select_columns_to_batches; empty specs return batches unchanged

aisrael added 2 commits March 15, 2026 10:10
- Updated `ReadCsvStep` to include an optional `limit` parameter for controlling the maximum number of rows read.
- Modified `build_reader` function to pass the `limit` when reading CSV files.
- Adjusted `head` command to set an offset of 0 when limiting rows for ORC files, ensuring correct row selection.
- Refactored `select_columns_to_batches` to improve column selection logic and handle empty specifications more gracefully.
@aisrael aisrael changed the title refactor(head): use build_reader and apply_select_and_display like tail/sample refactor: unify head with tail/sample, add CSV limit, unify column selection Mar 15, 2026
@aisrael aisrael merged commit 85fbd36 into main Mar 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant