Commit bfee844
## Which issue does this PR close?
- Closes #8809.
## Rationale for this change
Several DataFusion PRs
([#21464](apache/datafusion#21464),
[#21468](apache/datafusion#21468),
[#21471](apache/datafusion#21471),
[#21475](apache/datafusion#21475),
[#21477](apache/datafusion#21477),
[#21482](apache/datafusion#21482),
[#21532](apache/datafusion#21532)) optimize NULL
handling in scalar functions by replacing row-by-row null buffer
construction with bulk `NullBuffer::union`. When 3+ null buffers need
combining, they chain binary `union` calls, each allocating a new
`BooleanBuffer`.
`NullBuffer::union_many` reduces this to 1 allocation (clone + in-place
ANDs). For example, from
[#21482](apache/datafusion#21482):
Before:
```rust
[array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())]
.into_iter()
.fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls))
```
After:
```rust
NullBuffer::union_many([
array.nulls(),
from_array.nulls(),
to_array.nulls(),
stride.and_then(|s| s.nulls()),
])
```
Per @alamb's
[suggestion](#9692 (comment)),
this PR also implements the general-purpose mutable bitwise operations
on `BooleanArray` from #8809, following the `PrimitiveArray::unary` /
`unary_mut` pattern. This builds on the
`BitAndAssign`/`BitOrAssign`/`BitXorAssign` operators added to
`BooleanBuffer` in #9567.
## What changes are included in this PR?
**`NullBuffer::union_many(impl IntoIterator<Item =
Option<&NullBuffer>>)`**: combines multiple null buffers in a single
allocation (clone + in-place `&=`). Used by DataFusion for bulk null
handling.
**`BooleanArray` bitwise operations** (6 new public methods):
Unary (`op: FnMut(u64) -> u64`):
- `bitwise_unary(&self, op)` — always allocates a new array
- `bitwise_unary_mut(self, op) -> Result<Self, Self>` — in-place if
uniquely owned, `Err(self)` if shared
- `bitwise_unary_mut_or_clone(self, op)` — in-place if uniquely owned,
allocates if shared
Binary (`op: FnMut(u64, u64) -> u64`):
- `bitwise_bin_op(&self, rhs, op)` — always allocates, unions null
buffers
- `bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self>` — in-place
if uniquely owned, `Err(self)` if shared, unions null buffers
- `bitwise_bin_op_mut_or_clone(self, rhs, op)` — in-place if uniquely
owned, allocates if shared, unions null buffers
Note: #8809 proposed the binary variants take a raw buffer and
`right_offset_in_bits`. This PR takes `&BooleanArray` instead, which
encapsulates both and matches existing patterns like
`BooleanArray::from_binary`.
## Are these changes tested?
Yes. 23 tests for the `BooleanArray` bitwise methods and 6 tests for
`union_many`, covering:
- Basic correctness (AND, OR, NOT)
- Null handling (both nullable, one nullable, no nulls, null union)
- Buffer ownership (uniquely owned → in-place, shared → `Err` /
fallback)
- Edge cases (empty arrays, sliced arrays with non-zero offset,
misaligned left/right offsets)
## Are there any user-facing changes?
Six new public methods on `BooleanArray` and one new public method on
`NullBuffer`.
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent 711fac8 commit bfee844
File tree
2 files changed
+631
-0
lines changed- arrow-array/src/array
- arrow-buffer/src/buffer
2 files changed
+631
-0
lines changed
0 commit comments