Merged
Conversation
## Which issue does this PR close? - Closes #21518. ## Rationale for this change Similar to other recent changes, `substr` currently checks for NULLs and builds the result NULL bitmap on a per-row basis. It is faster to instead compute the result NULL bitmap in bulk via bitwise AND. Benchmarks (ARM64): ``` - substr, no count, short strings/substr_large_string [size=1024]: 21.4µs → 20.9µs (-2.3%) - substr, no count, short strings/substr_large_string [size=4096]: 83.1µs → 83.0µs (-0.1%) - substr, no count, short strings/substr_string [size=1024]: 20.5µs → 19.8µs (-3.4%) - substr, no count, short strings/substr_string [size=4096]: 78.8µs → 77.0µs (-2.3%) - substr, no count, short strings/substr_string_view [size=1024]: 18.9µs → 16.1µs (-14.8%) - substr, no count, short strings/substr_string_view [size=4096]: 74.0µs → 61.6µs (-16.8%) - substr, scalar args, long strings/substr_large_string [size=1024]: 35.2µs → 34.0µs (-3.4%) - substr, scalar args, long strings/substr_large_string [size=4096]: 140.6µs → 134.5µs (-4.3%) - substr, scalar args, long strings/substr_string [size=1024]: 35.5µs → 33.8µs (-4.8%) - substr, scalar args, long strings/substr_string [size=4096]: 138.9µs → 134.2µs (-3.4%) - substr, scalar args, long strings/substr_string_view [size=1024]: 34.0µs → 31.0µs (-8.8%) - substr, scalar args, long strings/substr_string_view [size=4096]: 132.0µs → 121.8µs (-7.7%) - substr, scalar args, short strings/substr_string [size=1024]: 31.0µs → 29.2µs (-5.8%) - substr, scalar args, short strings/substr_string [size=4096]: 120.8µs → 111.5µs (-7.7%) - substr, scalar args, short strings/substr_string_view [size=1024]: 26.8µs → 23.1µs (-13.8%) - substr, scalar args, short strings/substr_string_view [size=4096]: 101.6µs → 86.4µs (-14.9%) - substr, scalar start, no count, long strings/substr_string [size=1024]: 34.5µs → 33.2µs (-3.8%) - substr, scalar start, no count, long strings/substr_string [size=4096]: 134.4µs → 133.6µs (-0.6%) - substr, scalar start, no count, long strings/substr_string_view [size=1024]: 32.9µs → 29.4µs (-10.6%) - substr, scalar start, no count, long strings/substr_string_view [size=4096]: 126.1µs → 115.2µs (-8.6%) - substr, scalar start, no count, short strings/substr_string [size=1024]: 20.9µs → 20.1µs (-3.8%) - substr, scalar start, no count, short strings/substr_string [size=4096]: 80.1µs → 77.5µs (-3.2%) - substr, scalar start, no count, short strings/substr_string_view [size=1024]: 19.9µs → 16.7µs (-16.1%) - substr, scalar start, no count, short strings/substr_string_view [size=4096]: 74.4µs → 62.4µs (-16.1%) - substr, short count, long strings/substr_large_string [size=1024]: 30.3µs → 28.4µs (-6.3%) - substr, short count, long strings/substr_large_string [size=4096]: 117.1µs → 112.0µs (-4.4%) - substr, short count, long strings/substr_string [size=1024]: 30.2µs → 28.3µs (-6.3%) - substr, short count, long strings/substr_string [size=4096]: 118.0µs → 111.0µs (-5.9%) - substr, short count, long strings/substr_string_view [size=1024]: 26.1µs → 22.8µs (-12.6%) - substr, short count, long strings/substr_string_view [size=4096]: 101.5µs → 87.7µs (-13.6%) - substr, with count, long strings/substr_large_string [size=1024]: 34.6µs → 32.8µs (-5.2%) - substr, with count, long strings/substr_large_string [size=4096]: 136.7µs → 133.0µs (-2.7%) - substr, with count, long strings/substr_string [size=1024]: 34.2µs → 32.7µs (-4.4%) - substr, with count, long strings/substr_string [size=4096]: 136.6µs → 132.3µs (-3.1%) - substr, with count, long strings/substr_string_view [size=1024]: 33.3µs → 30.3µs (-9.0%) - substr, with count, long strings/substr_string_view [size=4096]: 129.1µs → 119.6µs (-7.4%) ``` ## What changes are included in this PR? * Implement optimization * Rename `make_and_append_view` to `append_view`, and have callers deal with NULL handling; making it part of `append_view` encourages per-row NULL computations, which should be avoided when possible. * Mark `append_view` as never-inline; this avoids a performance regression on some of the `substr` microbenchmarks, where LLVM is a little eager to inline a large-ish function into a hot loop. ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
…erState> (#21517) ## Which issue does this PR close? Partially addresses #20910, might be the last one for now. ## Rationale for this change In full outer joins with filters, `BufferedBatch` tracks which buffered rows had all filter evaluations fail using a `HashMap<u64, bool>`. This map is read and written per-row in a hot loop during `freeze_streamed_matched`. The HashMap pays ~40-60 bytes per entry (8-byte u64 key + 1-byte bool value + hash table overhead), hashes every key twice per iteration (once for `get`, once for `insert`), and scatters entries across heap allocations with poor cache locality. ## What changes are included in this PR? Replaces `HashMap<u64, bool>` with `Vec<FilterState>` indexed by absolute row position within the batch. `FilterState` is a `#[repr(u8)]` enum with three variants (`Unvisited`, `AllFailed`, `SomePassed`), so the Vec is 1 byte per row — allocated once, direct-indexed, no hashing. At the default batch size of 8192 rows the Vec is 8 KB (fits in L1 cache). Even at large batch sizes (32K+), 32 KB is still within L1 on most machines, while the HashMap at 32K entries would consume ~1-2 MB of scattered heap memory. Three states are needed because a simple `Vec<bool>` cannot distinguish "never matched" (handled separately by `null_joined`) from "matched but all filters failed" (must be emitted as null-joined). The enum variant names are self-documenting, unlike `Option<bool>` where `None`/`Some(true)`/`Some(false)` would be opaque. ## Are these changes tested? Existing tests. ## Are there any user-facing changes? No.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )