Skip to content

[pull] main from apache:main#93

Merged
pull[bot] merged 2 commits intoburaksenn:mainfrom
apache:main
Apr 10, 2026
Merged

[pull] main from apache:main#93
pull[bot] merged 2 commits intoburaksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 10, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

neilconway and others added 2 commits April 9, 2026 21:38
## Which issue does this PR close?

- Closes #21518.

## Rationale for this change

Similar to other recent changes, `substr` currently checks for NULLs and
builds the result NULL bitmap on a per-row basis. It is faster to
instead compute the result NULL bitmap in bulk via bitwise AND.

Benchmarks (ARM64):
```
  - substr, no count, short strings/substr_large_string [size=1024]: 21.4µs → 20.9µs (-2.3%)
  - substr, no count, short strings/substr_large_string [size=4096]: 83.1µs → 83.0µs (-0.1%)
  - substr, no count, short strings/substr_string [size=1024]: 20.5µs → 19.8µs (-3.4%)
  - substr, no count, short strings/substr_string [size=4096]: 78.8µs → 77.0µs (-2.3%)
  - substr, no count, short strings/substr_string_view [size=1024]: 18.9µs → 16.1µs (-14.8%)
  - substr, no count, short strings/substr_string_view [size=4096]: 74.0µs → 61.6µs (-16.8%)
  - substr, scalar args, long strings/substr_large_string [size=1024]: 35.2µs → 34.0µs (-3.4%)
  - substr, scalar args, long strings/substr_large_string [size=4096]: 140.6µs → 134.5µs (-4.3%)
  - substr, scalar args, long strings/substr_string [size=1024]: 35.5µs → 33.8µs (-4.8%)
  - substr, scalar args, long strings/substr_string [size=4096]: 138.9µs → 134.2µs (-3.4%)
  - substr, scalar args, long strings/substr_string_view [size=1024]: 34.0µs → 31.0µs (-8.8%)
  - substr, scalar args, long strings/substr_string_view [size=4096]: 132.0µs → 121.8µs (-7.7%)
  - substr, scalar args, short strings/substr_string [size=1024]: 31.0µs → 29.2µs (-5.8%)
  - substr, scalar args, short strings/substr_string [size=4096]: 120.8µs → 111.5µs (-7.7%)
  - substr, scalar args, short strings/substr_string_view [size=1024]: 26.8µs → 23.1µs (-13.8%)
  - substr, scalar args, short strings/substr_string_view [size=4096]: 101.6µs → 86.4µs (-14.9%)
  - substr, scalar start, no count, long strings/substr_string [size=1024]: 34.5µs → 33.2µs (-3.8%)
  - substr, scalar start, no count, long strings/substr_string [size=4096]: 134.4µs → 133.6µs (-0.6%)
  - substr, scalar start, no count, long strings/substr_string_view [size=1024]: 32.9µs → 29.4µs (-10.6%)
  - substr, scalar start, no count, long strings/substr_string_view [size=4096]: 126.1µs → 115.2µs (-8.6%)
  - substr, scalar start, no count, short strings/substr_string [size=1024]: 20.9µs → 20.1µs (-3.8%)
  - substr, scalar start, no count, short strings/substr_string [size=4096]: 80.1µs → 77.5µs (-3.2%)
  - substr, scalar start, no count, short strings/substr_string_view [size=1024]: 19.9µs → 16.7µs (-16.1%)
  - substr, scalar start, no count, short strings/substr_string_view [size=4096]: 74.4µs → 62.4µs (-16.1%)
  - substr, short count, long strings/substr_large_string [size=1024]: 30.3µs → 28.4µs (-6.3%)
  - substr, short count, long strings/substr_large_string [size=4096]: 117.1µs → 112.0µs (-4.4%)
  - substr, short count, long strings/substr_string [size=1024]: 30.2µs → 28.3µs (-6.3%)
  - substr, short count, long strings/substr_string [size=4096]: 118.0µs → 111.0µs (-5.9%)
  - substr, short count, long strings/substr_string_view [size=1024]: 26.1µs → 22.8µs (-12.6%)
  - substr, short count, long strings/substr_string_view [size=4096]: 101.5µs → 87.7µs (-13.6%)
  - substr, with count, long strings/substr_large_string [size=1024]: 34.6µs → 32.8µs (-5.2%)
  - substr, with count, long strings/substr_large_string [size=4096]: 136.7µs → 133.0µs (-2.7%)
  - substr, with count, long strings/substr_string [size=1024]: 34.2µs → 32.7µs (-4.4%)
  - substr, with count, long strings/substr_string [size=4096]: 136.6µs → 132.3µs (-3.1%)
  - substr, with count, long strings/substr_string_view [size=1024]: 33.3µs → 30.3µs (-9.0%)
  - substr, with count, long strings/substr_string_view [size=4096]: 129.1µs → 119.6µs (-7.4%)
```

## What changes are included in this PR?

* Implement optimization
* Rename `make_and_append_view` to `append_view`, and have callers deal
with NULL handling; making it part of `append_view` encourages per-row
NULL computations, which should be avoided when possible.
* Mark `append_view` as never-inline; this avoids a performance
regression on some of the `substr` microbenchmarks, where LLVM is a
little eager to inline a large-ish function into a hot loop.

## Are these changes tested?

Yes.

## Are there any user-facing changes?

No.
…erState> (#21517)

## Which issue does this PR close?

Partially addresses #20910, might be the last one for now.

## Rationale for this change
In full outer joins with filters, `BufferedBatch` tracks which buffered
rows had all filter evaluations fail using a `HashMap<u64, bool>`. This
map is read and written per-row in a hot loop during
`freeze_streamed_matched`. The HashMap pays ~40-60 bytes per entry
(8-byte u64 key + 1-byte bool value + hash table overhead), hashes every
key twice per iteration (once for `get`, once for `insert`), and
scatters entries across heap allocations with poor cache locality.

## What changes are included in this PR?

Replaces `HashMap<u64, bool>` with `Vec<FilterState>` indexed by
absolute row position within the batch. `FilterState` is a `#[repr(u8)]`
enum with three variants (`Unvisited`, `AllFailed`, `SomePassed`), so
the Vec is 1 byte per row — allocated once, direct-indexed, no hashing.
At the default batch size of 8192 rows the Vec is 8 KB (fits in L1
cache). Even at large batch sizes (32K+), 32 KB is still within L1 on
most machines, while the HashMap at 32K entries would consume ~1-2 MB of
scattered heap memory.

Three states are needed because a simple `Vec<bool>` cannot distinguish
"never matched" (handled separately by `null_joined`) from "matched but
all filters failed" (must be emitted as null-joined). The enum variant
names are self-documenting, unlike `Option<bool>` where
`None`/`Some(true)`/`Some(false)` would be opaque.

## Are these changes tested?
Existing tests.

## Are there any user-facing changes?
No.
@pull pull bot locked and limited conversation to collaborators Apr 10, 2026
@pull pull bot added the ⤵️ pull label Apr 10, 2026
@pull pull bot merged commit e8990c4 into buraksenn:main Apr 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants