[arrow-pyarrow]: restore __arrow_c_array__ tuple error#1
Closed
alamb wants to merge 17 commits intoTpt:tpt/pyarrow-nitsfrom
Closed
[arrow-pyarrow]: restore __arrow_c_array__ tuple error#1alamb wants to merge 17 commits intoTpt:tpt/pyarrow-nitsfrom
alamb wants to merge 17 commits intoTpt:tpt/pyarrow-nitsfrom
Conversation
# Which issue does this PR close? - part of apache#9108 # Rationale for this change Prepare for next release # What changes are included in this PR? 1. Update version to `58.1.0` 2. Add changelog. See rendered preview here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.1.0/CHANGELOG.md # Are these changes tested? By CI # Are there any user-facing changes? Yes
…apache#9590) ## Summary - Reserve `output.views` capacity in `ByteViewArrayDecoderDictionary::read` before the decode loop - Reserve `output.offsets` capacity in `ByteArrayDecoderDictionary::read` before the decode loop This avoids per-chunk reallocation during `extend` calls inside the dictionary decode loop. Closes apache#9587 ## Test plan - [ ] Existing tests pass (no functional change, only pre-allocation) - [ ] Benchmark dictionary-encoded StringView/BinaryView/String reads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Rationale for this change In some cases, it is desirable to print strings with surrounding quotation marks. A typical example that we run into in https://github.com/rerun-io/rerun is a `StructArray` that contains empty strings: Current formatting: ```text {name: } ``` Added option in this PR: ```text {name: ""} ``` # What changes are included in this PR? This PR relies on `std::fmt::Debug` to do the actual formatting of strings, which means that all escaping is handled out of the box. # Are these changes tested? This PR contains test for different types of inputs, including escape sequences. Additionally, it also tests the `StructArray` example outlined above. # Are there any user-facing changes? By default this option is false, making the feature opt-in. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? Closes apache#9580 ## Rationale The current VLQ decoder calls `get_aligned` for each byte, which involves repeated offset calculations and bounds checks in the hot loop. ## What changes are included in this PR? Align to the byte boundary once, then iterate directly over the buffer slice, avoiding per-byte overhead from `get_aligned`. ## Are there any user-facing changes? No. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Rationale for this change The `object_store` crate release 0.13.2 breaks the build of parquet because it feature-gates the `buffered` module. I have filed apache/arrow-rs-object-store#677 about the breakage; meanwhile this fix is made in expectation that 0.13.2 will not be yanked and the feature gate will remain. # What changes are included in this PR? Bump the version to 0.13.2 and requesting the "tokio" feature. # Are these changes tested? The build should succeed in CI workflows. # Are there any user-facing changes? No Co-authored-by: Mikhail Zabaluev <mikhail.zabaluev@gmail.com>
Updates the requirements on [sha2](https://github.com/RustCrypto/hashes) to permit the latest version. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/RustCrypto/hashes/commit/ffe093984c004769747e998f77da8ff7c0e7a765"><code>ffe0939</code></a> Release sha2 0.11.0 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/806">#806</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/8991b65fe400c31c4cc189510f86ae642c470cd9"><code>8991b65</code></a> Use the standard order of the <code>[package]</code> section fields (<a href="https://redirect.github.com/RustCrypto/hashes/issues/807">#807</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/3d2bc57db40fd6aeb25d6c6da98d67e2784c2985"><code>3d2bc57</code></a> sha2: refactor backends (<a href="https://redirect.github.com/RustCrypto/hashes/issues/802">#802</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/faa55fb83697c8f3113636d88070e5f5edc8c335"><code>faa55fb</code></a> sha3: bump <code>keccak</code> to v0.2 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/803">#803</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/d3e6489e56f8486d4a93ceb7a8abf4924af1de7b"><code>d3e6489</code></a> sha3 v0.11.0-rc.9 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/801">#801</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/bbf6f51ff97f81ab15e6e5f6cf878bfbcb1f47c8"><code>bbf6f51</code></a> sha2: tweak backend docs (<a href="https://redirect.github.com/RustCrypto/hashes/issues/800">#800</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/155dbbf2959dbec0ec75948a82590ddaede2d3bc"><code>155dbbf</code></a> sha3: add default value for the <code>DS</code> generic parameter on <code>TurboShake128/256</code>...</li> <li><a href="https://github.com/RustCrypto/hashes/commit/ed514f2b34526683b3b7c41670f1887982c3df64"><code>ed514f2</code></a> Use published version of <code>keccak</code> v0.2 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/799">#799</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/702bcd83735a49c928c0fc24506924f5c0aa22af"><code>702bcd8</code></a> Migrate to closure-based <code>keccak</code> (<a href="https://redirect.github.com/RustCrypto/hashes/issues/796">#796</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/827c043f82d57666a0b146d156e91c39535c1305"><code>827c043</code></a> sha3 v0.11.0-rc.8 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/794">#794</a>)</li> <li>Additional commits viewable in <a href="https://github.com/RustCrypto/hashes/compare/groestl-v0.10.0...sha2-v0.11.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Closes apache#9340. # Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> # What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Support `ListView` codec in arrow-json. Using `ListLikeArray` trait to simplify implementation. # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Tests added # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --> New encoder/decoder
… verification (apache#9604) # Which issue does this PR close? - Closes apache#9603 # Rationale for this change The release and dev KEYS files could get out of synch. We should use the release/ version: - Users use the release/ version not dev/ version when they verify our artifacts' signature - https://dist.apache.org/ may reject our request when we request many times by CI # What changes are included in this PR? Use `https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/KEYS` to download the KEYS file and the expected `https://dist.apache.org/repos/dist/dev/arrow` for the RC artifacts. # Are these changes tested? Yes, I've verified 58.1.0 1 both previous to the change and after the change. # Are there any user-facing changes? No
…uct)` (apache#9597) # Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Closes apache#9596. # Rationale for this change Check issue <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> # What changes are included in this PR? Reuse `shred_basic_variant` as a fast path for unshredded `Struct` handling in `variant_get(..., Struct)` <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> # Are these changes tested? Yes, added two unit tests to establish safe mode behavior. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. -->
## Summary - Fix `MutableArrayData::extend_nulls` which previously panicked unconditionally for both sparse and dense Union arrays - For sparse unions: append the first type_id and extend nulls in all children - For dense unions: append the first type_id, compute offsets into the first child, and extend nulls in that child only ## Background This bug was discovered via DataFusion. `CaseExpr` uses `MutableArrayData` via `scatter()` to build result arrays. When a `CASE` expression returns a Union type (e.g., from `json_get` which returns a JSON union) and there are rows where no `WHEN` branch matches (implicit `ELSE NULL`), `scatter` calls `extend_nulls` which panics with "cannot call extend_nulls on UnionArray as cannot infer type". Any query like: ```sql SELECT CASE WHEN condition THEN returns_union(col, 'key') END FROM table ``` would panic if `condition` is false for any row. ## Root Cause The `extend_nulls` implementation for Union arrays unconditionally panicked because it claimed it "cannot infer type". However, the Union's field definitions (child types and type IDs) are available in the `MutableArrayData`'s data type — there's enough information to produce valid null entries by picking the first declared type_id. ## Test plan - [x] Added test for sparse union `extend_nulls` - [x] Added test for dense union `extend_nulls` - [x] Existing `test_union_dense` continues to pass - [x] All `array_transform` tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Relates to apache#9497. # Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> # What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> As part of the effort to move the Json reader away from `ArrayData` toward typed `ArrayRef` APIs, it's necessary to change the `ArrayDecoder::decode` interface to return `ArrayRef` directly and updates all decoder implementations (list, struct, map, run-end encoded) to construct typed arrays without intermediate `ArrayData` round-trips. New benchmarks for map and run-end encoded decoding are added to verify there is no performance regression. # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --> No
# Which issue does this PR close? - closes apache#9593 # Rationale for this change In a previous PR (apache#9593), I change instances of `truncate(0)` to `clear()`. However, this breaks the test `test_truncate_with_pool` at `arrow-buffer/src/buffer/mutable.rs:1357`, due to an inconsistency between the implementation of `truncate` and `clear`. This PR fixes that test. # What changes are included in this PR? This PR copies a section of code related to the `pool` feature present in `truncate` but absent in `clear`, fixing the failing unit test. # Are these changes tested? Yes. # Are there any user-facing changes? No.
) # Rationale for this change CdcOptions only contains primitive fields (usize, usize, i32) so deriving PartialEq and Eq is straightforward. This is needed by downstream crates such as DataFusion that embed CdcOptions in their own configuration structs and need to compare them. # What changes are included in this PR? Implemented PartialEq and Eq for CdcOptions. # Are these changes tested? Added an equality test. # Are there any user-facing changes? No.
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Closes apache#8400. # Rationale for this change Check issue <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> # What changes are included in this PR? - Added `AppendNullMode` enum supporting all semantics. - Replaced the bool logic to the new enum - Fix test outputs for List Array cases <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> # Are these changes tested? - Added unit tests <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. -->
# Rationale for this change Makes the code simpler and more readable by relying on new PyO3 and Rust features. No behavior should have changed outside of an error message if `__arrow_c_array__` does not return a tuple # What changes are included in this PR? - use `.call_method0(M)?` instead of `.getattr(M)?.call0()` - Use `.extract()` that allows more advanced features like directly extracting tuple elements - remove temporary variables just before returning - use &raw const and &raw mut pointers instead of casting and addr_of!
33f8058 to
bb0edac
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Restore an explicit user-facing arrow_c_array type error while keeping the PyO3 cleanup from apache#9594.
Details:
Testing: