Skip to content

[arrow-pyarrow]: restore __arrow_c_array__ tuple error#1

Closed
alamb wants to merge 17 commits intoTpt:tpt/pyarrow-nitsfrom
alamb:fix/pr9594-arrow-c-array-error
Closed

[arrow-pyarrow]: restore __arrow_c_array__ tuple error#1
alamb wants to merge 17 commits intoTpt:tpt/pyarrow-nitsfrom
alamb:fix/pr9594-arrow-c-array-error

Conversation

@alamb
Copy link
Copy Markdown

@alamb alamb commented Mar 31, 2026

Summary:
Restore an explicit user-facing arrow_c_array type error while keeping the PyO3 cleanup from apache#9594.

Details:

  • add a small helper that validates arrow_c_array returned a tuple before extracting capsules
  • reuse it in both ArrayData and RecordBatch

Testing:

  • cargo test -p arrow-pyarrow

alamb and others added 17 commits March 20, 2026 15:00
# Which issue does this PR close?

- part of apache#9108

# Rationale for this change

Prepare for next release

# What changes are included in this PR?

1. Update version to `58.1.0`
2. Add changelog. See rendered preview here:
https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.1.0/CHANGELOG.md

# Are these changes tested?

By CI
# Are there any user-facing changes?

Yes
…apache#9590)

## Summary

- Reserve `output.views` capacity in
`ByteViewArrayDecoderDictionary::read` before the decode loop
- Reserve `output.offsets` capacity in
`ByteArrayDecoderDictionary::read` before the decode loop

This avoids per-chunk reallocation during `extend` calls inside the
dictionary decode loop.

Closes apache#9587

## Test plan

- [ ] Existing tests pass (no functional change, only pre-allocation)
- [ ] Benchmark dictionary-encoded StringView/BinaryView/String reads

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Rationale for this change

In some cases, it is desirable to print strings with surrounding
quotation marks. A typical example that we run into in
https://github.com/rerun-io/rerun is a `StructArray` that contains empty
strings:

Current formatting:

```text
{name: }
```

Added option in this PR:

```text
{name: ""}
```

# What changes are included in this PR?

This PR relies on `std::fmt::Debug` to do the actual formatting of
strings, which means that all escaping is handled out of the box.

# Are these changes tested?

This PR contains test for different types of inputs, including escape
sequences. Additionally, it also tests the `StructArray` example
outlined above.

# Are there any user-facing changes?

By default this option is false, making the feature opt-in.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close?

Closes apache#9580

## Rationale

The current VLQ decoder calls `get_aligned` for each byte, which
involves repeated offset calculations and bounds checks in the hot loop.

## What changes are included in this PR?

Align to the byte boundary once, then iterate directly over the buffer
slice, avoiding per-byte overhead from `get_aligned`.

## Are there any user-facing changes?

No.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Rationale for this change

The `object_store` crate release 0.13.2 breaks the build of parquet
because it feature-gates the `buffered` module. I have filed
apache/arrow-rs-object-store#677 about the
breakage; meanwhile this fix is made in expectation that 0.13.2 will not
be yanked and the feature gate will remain.

# What changes are included in this PR?

Bump the version to 0.13.2 and requesting the "tokio" feature.

# Are these changes tested?

The build should succeed in CI workflows.

# Are there any user-facing changes?

No

Co-authored-by: Mikhail Zabaluev <mikhail.zabaluev@gmail.com>
Updates the requirements on [sha2](https://github.com/RustCrypto/hashes)
to permit the latest version.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/RustCrypto/hashes/commit/ffe093984c004769747e998f77da8ff7c0e7a765"><code>ffe0939</code></a>
Release sha2 0.11.0 (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/806">#806</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/8991b65fe400c31c4cc189510f86ae642c470cd9"><code>8991b65</code></a>
Use the standard order of the <code>[package]</code> section fields (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/807">#807</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/3d2bc57db40fd6aeb25d6c6da98d67e2784c2985"><code>3d2bc57</code></a>
sha2: refactor backends (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/802">#802</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/faa55fb83697c8f3113636d88070e5f5edc8c335"><code>faa55fb</code></a>
sha3: bump <code>keccak</code> to v0.2 (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/803">#803</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/d3e6489e56f8486d4a93ceb7a8abf4924af1de7b"><code>d3e6489</code></a>
sha3 v0.11.0-rc.9 (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/801">#801</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/bbf6f51ff97f81ab15e6e5f6cf878bfbcb1f47c8"><code>bbf6f51</code></a>
sha2: tweak backend docs (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/800">#800</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/155dbbf2959dbec0ec75948a82590ddaede2d3bc"><code>155dbbf</code></a>
sha3: add default value for the <code>DS</code> generic parameter on
<code>TurboShake128/256</code>...</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/ed514f2b34526683b3b7c41670f1887982c3df64"><code>ed514f2</code></a>
Use published version of <code>keccak</code> v0.2 (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/799">#799</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/702bcd83735a49c928c0fc24506924f5c0aa22af"><code>702bcd8</code></a>
Migrate to closure-based <code>keccak</code> (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/796">#796</a>)</li>
<li><a
href="https://github.com/RustCrypto/hashes/commit/827c043f82d57666a0b146d156e91c39535c1305"><code>827c043</code></a>
sha3 v0.11.0-rc.8 (<a
href="https://redirect.github.com/RustCrypto/hashes/issues/794">#794</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/RustCrypto/hashes/compare/groestl-v0.10.0...sha2-v0.11.0">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes apache#9340.

# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Support `ListView` codec in arrow-json. Using `ListLikeArray` trait to
simplify implementation.

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Tests added

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->

New encoder/decoder
… verification (apache#9604)

# Which issue does this PR close?

- Closes apache#9603 

# Rationale for this change

The release and dev KEYS files could get out of synch.
We should use the release/ version:
- Users use the release/ version not dev/ version when they verify our
artifacts' signature
- https://dist.apache.org/ may reject our request when we request many
times by CI

# What changes are included in this PR?

Use
`https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/KEYS`
to download the KEYS file and the expected
`https://dist.apache.org/repos/dist/dev/arrow` for the RC artifacts.

# Are these changes tested?

Yes, I've verified 58.1.0 1 both previous to the change and after the
change.

# Are there any user-facing changes?

No
…uct)` (apache#9597)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes apache#9596.

# Rationale for this change

Check issue
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

# What changes are included in this PR?

Reuse `shred_basic_variant` as a fast path for unshredded `Struct`
handling in `variant_get(..., Struct)`
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

# Are these changes tested?

Yes, added two unit tests to establish safe mode behavior.
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
## Summary

- Fix `MutableArrayData::extend_nulls` which previously panicked
unconditionally for both sparse and dense Union arrays
- For sparse unions: append the first type_id and extend nulls in all
children
- For dense unions: append the first type_id, compute offsets into the
first child, and extend nulls in that child only

## Background

This bug was discovered via DataFusion. `CaseExpr` uses
`MutableArrayData` via `scatter()` to build result arrays. When a `CASE`
expression returns a Union type (e.g., from `json_get` which returns a
JSON union) and there are rows where no `WHEN` branch matches (implicit
`ELSE NULL`), `scatter` calls `extend_nulls` which panics with "cannot
call extend_nulls on UnionArray as cannot infer type".

Any query like:
```sql
SELECT CASE WHEN condition THEN returns_union(col, 'key') END FROM table
```
would panic if `condition` is false for any row.

## Root Cause

The `extend_nulls` implementation for Union arrays unconditionally
panicked because it claimed it "cannot infer type". However, the Union's
field definitions (child types and type IDs) are available in the
`MutableArrayData`'s data type — there's enough information to produce
valid null entries by picking the first declared type_id.

## Test plan

- [x] Added test for sparse union `extend_nulls`
- [x] Added test for dense union `extend_nulls`
- [x] Existing `test_union_dense` continues to pass
- [x] All `array_transform` tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Relates to apache#9497.

# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

As part of the effort to move the Json reader away from `ArrayData`
toward typed `ArrayRef` APIs, it's necessary to change the
`ArrayDecoder::decode` interface to return `ArrayRef` directly and
updates all decoder implementations (list, struct, map, run-end encoded)
to construct typed arrays without intermediate `ArrayData` round-trips.
New benchmarks for map and run-end encoded decoding are added to verify
there is no performance regression.

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
No
# Which issue does this PR close?

- closes apache#9593

# Rationale for this change

In a previous PR (apache#9593), I change instances of `truncate(0)` to
`clear()`. However, this breaks the test `test_truncate_with_pool` at
`arrow-buffer/src/buffer/mutable.rs:1357`, due to an inconsistency
between the implementation of `truncate` and `clear`. This PR fixes that
test.

# What changes are included in this PR?

This PR copies a section of code related to the `pool` feature present
in `truncate` but absent in `clear`, fixing the failing unit test.

# Are these changes tested?

Yes.

# Are there any user-facing changes?

No.
)

# Rationale for this change

CdcOptions only contains primitive fields (usize, usize, i32) so
deriving PartialEq and Eq is straightforward. This is needed by
downstream crates such as DataFusion that embed CdcOptions in their own
configuration structs and need to compare them.

# What changes are included in this PR?

Implemented PartialEq and Eq for CdcOptions.

# Are these changes tested?

Added an equality test.

# Are there any user-facing changes?

No.
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes apache#8400.

# Rationale for this change

Check issue
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

# What changes are included in this PR?

- Added `AppendNullMode` enum supporting all semantics.
- Replaced the bool logic to the new enum
- Fix test outputs for List Array cases

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

# Are these changes tested?
- Added unit tests
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
# Rationale for this change

Makes the code simpler and more readable by relying on new PyO3 and Rust
features. No behavior should have changed outside of an error message if
`__arrow_c_array__` does not return a tuple

# What changes are included in this PR?

- use `.call_method0(M)?` instead of `.getattr(M)?.call0()`
- Use `.extract()` that allows more advanced features like directly
extracting tuple elements
- remove temporary variables just before returning
- use &raw const and &raw mut pointers instead of casting and addr_of!
@alamb alamb force-pushed the fix/pr9594-arrow-c-array-error branch from 33f8058 to bb0edac Compare March 31, 2026 20:41
@Tpt Tpt deleted the branch Tpt:tpt/pyarrow-nits April 3, 2026 15:12
@Tpt Tpt closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.