Skip to content

Release 0.3.2: DataFrame convert API, Avro Int16 support, eyre, and REPL refactor#36

Merged
aisrael merged 18 commits intomainfrom
0.3.2
Mar 15, 2026
Merged

Release 0.3.2: DataFrame convert API, Avro Int16 support, eyre, and REPL refactor#36
aisrael merged 18 commits intomainfrom
0.3.2

Conversation

@aisrael
Copy link
Owner

@aisrael aisrael commented Mar 15, 2026

Summary

Branch 0.3.2 brings dependency updates, Avro/DataFrame improvements, and REPL cleanup.

Key changes

  • Convert command: DataFrame API support for streamlined file processing; resolve_input_file_type renamed to resolve_file_type.
  • Avro: Compatibility for Int16 fields in record batches (pipeline/avro.rs).
  • Error handling: Switched from anyhow to eyre.
  • REPL: Refactored evaluation to use exec_* naming and improved test structure; feature files updated for REPL equivalents.
  • Docs: README updates for JSON support and REPL equivalents.
  • CI: actions-rust-lang/setup-rust-toolchain@v1, Rust 1.94.0, no nightly.
  • Misc: .clocignore, dependency/version bumps in Cargo files, small fixes and rewordings.

Commits (origin/main..HEAD)

2810db2 Use eyre instead of anyhow
b4abc0c Added .clocignore
b499ea4 Add Avro compatibility for Int16 fields in record batches
6829793 Update dependencies and versions in Cargo files
d70e1e9 Just updating the comment
c9d17c2 Refactor REPL evaluation methods to use exec_* naming convention and improve test structure
0a3364c Updated with REPL equivalents
59510c9 Update README with JSON support
b7bb128 Make this more explicit
96bf0d8 cargo fmt
5a31905 resolve_input_file_type -> resolve_file_type
5a1226f Implement DataFrame API support in convert command for streamlined file processing
fe05ee1 Don't use nightly
547cfe6 typo
69352cb Use 1.94.0
7ab5def actions-rust-lang/setup-rust-toolchain@v1
d30bc85 Rewording

28 files changed, 754 insertions, 610 deletions.

Made with Cursor

aisrael added 18 commits March 15, 2026 10:37
…le processing

- Added support for reading and writing JSON files using DataFusion's DataFrame API.
- Introduced a new function `is_datafusion_native` to determine if the input format is compatible with DataFusion.
- Refactored the `convert` function to utilize either the DataFrame API or a fallback method based on the input and output file types.
- Enhanced error handling for unsupported file types in the DataFrameReader.
- Updated tests to reflect the new JSON support and error messages.
…improve test structure

- Removed async evaluation methods from ReplPipelineBuilder and replaced them with exec_* methods.
- Updated corresponding tests to reflect the new method names and improved error handling.
- Consolidated path extraction tests for read operations.
- Updated various dependencies in Cargo.toml, including `anyhow`, `clap`, `datafusion`, and `tokio`, to their latest compatible versions.
- Adjusted version specifications for `anstyle`, `console`, and other packages in Cargo.lock to reflect new releases.
- Improved dependency management by ensuring consistent versioning across related packages.
- Introduced functions to check for Int16 fields in schemas and convert them to Int32 for Avro compatibility.
- Implemented a wrapper for record batch readers that casts Int16 columns to Int32.
- Updated the `write_record_batches` function to handle schema adjustments before writing to Avro files.
- Added a test to verify that Int16 values are correctly upcast to Int32 in the written Avro output.
@aisrael aisrael merged commit 8786e8a into main Mar 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant