#8806 Consolidate bitwise operations and make nullif respect ArrayData bitmap layout contract#8869
#8806 Consolidate bitwise operations and make nullif respect ArrayData bitmap layout contract#8869joe-ucp wants to merge 1 commit intoapache:mainfrom
nullif respect ArrayData bitmap layout contract#8869Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @joe-ucp
This PR is likely too large for me to be able to realistically review it in any near term timeline
Would you be willing to work on some more focused PRs that each do some of the work described in #8806?
For example, I have already made a PR for the first item
Perhaps you could make a PR for
Add BooleanArray::binary and BooleanArray::unary functions that use the new Buffer functions internally
(you could base it on #8854 or use bitwise_unary_op_helper directly)
|
Splitting out the nullif improvements into its own PR would also help review substantially |
|
@alamb Thanks so much for the detailed feedback and for pointing me at #8806 and #8854, I really appreciate it. You’re absolutely right that this PR ended up too large and tries to do too many things at once. I’m still very new to contributing on GitHub and to this project, so this was me exploring the space a bit too aggressively 😅 I’m happy to rework this into more focused PRs: First, I’ll open a small PR that just adds BooleanArray::binary / BooleanArray::unary wired up to the new buffer bitwise APIs (based on #8854 / the existing helpers). Separately, I’ll split out the nullif / layout-contract improvement into their own PR to make that review more manageable. I’ll start on the BooleanArray PR tomorrow and follow up with the nullif one after that. Once those are up, I’m also happy to keep iterating on the rest of the work described in #8806 in smaller reviewable chunks. Thanks again for taking the time to review and for the guidance! 🫡 |
Here’s a version that fits their template and avoids mentioning Python:
Which issue does this PR close?
Rationale for this change
The bitwise operations used across
arrow-buffer,arrow-arith, andarrow-selectwere spread across several overlapping helpers, which made it difficult to:nullif.While working on bitwise kernels, we also found that
nullifinarrow-selectwas fragile around sliced arrays, offsets, and null-count handling. This PR centralizes the bitwise logic and makesnullifexplicitly respect theArrayDatabitmap layout invariants (length, offset, validity mapping).What changes are included in this PR?
High-level:
Documented the bitmap layout contract
Added a documentation file (
docs/arraydata_bitmap_layout_contract.md) that spells out:lenandoffsetrelate to validity bits,offset = 0),nullifvalidity rule:result_valid(i) = V(i) & !C(i).Strengthened core bitwise testing in
arrow-buffer/arrow-arithAdded/extended tests that:
Buffer/MutableBufferbitwise APIs against existing helpers across random data,(offset + len)boundary cases),MutableBuffer) operations produce the same bits as allocating versions.Added
randas a dev-dependency ofarrow-arithto support reproducible randomized tests.Refactored
nullifto follow the layout contractIntroduced a helper:
which implements the core
V(i) & !C(i)validity logic over buffers and bit offsets.Updated
nullifto:Derive correct bit offsets from
ArrayDatafor the left validity and condition mask,Always build new result
ArrayDatawithoffset = 0,Slice value/offset buffers so element 0 in the result corresponds to logical index 0 for:
Nullif-focused tests
Added a focused unit test for
compute_nullif_validitythat exercises:(offset + len) % 8 == 0boundaries,Verified and fixed behavior so that the existing
nulliftests all pass, including:null_countregression tests,nullif_fuzztest.Net effect:
nullifis now a thin consumer of the documented bitmap layout contract and the consolidated bitwise APIs.Are these changes tested?
Yes.
The following were run locally:
All of the above pass, including the previously failing
nullifregression and fuzz tests.Are there any user-facing changes?
There are no breaking changes to public APIs.
User-visible effects:
nullifbehavior is now strictly aligned with the documented bitmap layout contract for:This should only be observable as bug fixes in corner cases (e.g., nulls at specific offsets, struct/string slices), not API changes.
No new configuration or feature flags are introduced; this is primarily a correctness and maintainability improvement around existing behavior.