Avoid unnecessary buffer zero-fill in Snappy decompression#9583
Avoid unnecessary buffer zero-fill in Snappy decompression#9583Dandandan wants to merge 2 commits intoapache:mainfrom
Conversation
Write directly into spare capacity instead of resize+zero-fill, eliminating unnecessary memset for the decompression output buffer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Details
Resource Usagebase (merge-base)
branch
|
| let n = self | ||
| .decoder | ||
| .decompress(input_buf, &mut spare_bytes[..len]) | ||
| .map_err(|e| -> ParquetError { e.into() })?; |
There was a problem hiding this comment.
If this returns on error before setting len, will the buffer be left in an inconsistent state?
I think the use of the mut slice ensures that the call to decompress won't overwrite the newly allocated bytes.
However, this also basically passes in uninitialized bytes to decompress -- how do we know that the decompress doesn't read them? Maybe we should add a SAFETY warning to the decompress API that says it can't rely on initialized bytes 🤔
There was a problem hiding this comment.
Effectively we rely on this:
https://docs.rs/snap/latest/snap/raw/struct.Decoder.html#errors
- output has length less than decompress_len(input).
To not use unsafe we would need to have this feature:
BurntSushi/rust-snappy#62
There was a problem hiding this comment.
Maybe we could improve the documentation around Decoder::decompress to mention it can receive non zero bytes and should not make any assumptions about their contents. I think that would be adequate
There was a problem hiding this comment.
This does seem to be an appropriate use of spare_capacity_mut/set_len.
There was a problem hiding this comment.
I spent some more time exploring this (and arguing with Codex about it)
The main issue is the Rust snappy implementation's contract takes an output buffer and doesn't say it can handle uninitialized bytes
That being said I can't think of how passing uninitialized bytes as an output location could cause problems (even if snappy changes how it internally works)
There was a problem hiding this comment.
Yes, technically I think because snappy function is not marked unsafe, it breaks the contract (i.e. it might read the buffer). In practice it doesn't need to read anything.
A MaybeUninit API would solve that.
There was a problem hiding this comment.
There is a PR now:
BurntSushi/rust-snappy#65
|
run benchmark arrow_reader |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
run benchmark arrow_reader |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing pr/snappy-zero-fill (4149d2b) to 51bf8a4 (merge-base) diff File an issue against this benchmark runner |
alamb
left a comment
There was a problem hiding this comment.
I am torn on this one. Let's see if we can get a measurable perf win and then I can hem and haw about it more
| let n = self | ||
| .decoder | ||
| .decompress(input_buf, &mut spare_bytes[..len]) | ||
| .map_err(|e| -> ParquetError { e.into() })?; |
There was a problem hiding this comment.
I spent some more time exploring this (and arguing with Codex about it)
The main issue is the Rust snappy implementation's contract takes an output buffer and doesn't say it can handle uninitialized bytes
That being said I can't think of how passing uninitialized bytes as an output location could cause problems (even if snappy changes how it internally works)
|
run benchmark arrow_reader |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing pr/snappy-zero-fill (4149d2b) to 51bf8a4 (merge-base) diff File an issue against this benchmark runner |
Which issue does this PR close?
Closes #9579
Rationale
Currently, Snappy decompression uses
resize(len, 0)which zero-fills the buffer before writing. Since Snappy will overwrite the entire region on success, this memset is wasted work.1-2% win on snappy e2e decoding of snappy encoded parquet data
What changes are included in this PR?
Write directly into spare capacity using
reserve()+spare_capacity_mut()+set_len(), eliminating the unnecessary zero-fill.Are there any user-facing changes?
No.
🤖 Generated with Claude Code