Avoid unnecessary buffer zero-fill in Snappy decompression by Dandandan · Pull Request #9583 · apache/arrow-rs

Dandandan · 2026-03-19T19:58:24Z

Which issue does this PR close?

Rationale

Currently, Snappy decompression uses resize(len, 0) which zero-fills the buffer before writing. Since Snappy will overwrite the entire region on success, this memset is wasted work.

1-2% win on snappy e2e decoding of snappy encoded parquet data

What changes are included in this PR?

Write directly into spare capacity using reserve() + spare_capacity_mut() + set_len(), eliminating the unnecessary zero-fill.

Are there any user-facing changes?

No.

🤖 Generated with Claude Code

Write directly into spare capacity instead of resize+zero-fill, eliminating unnecessary memset for the decompression output buffer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Dandandan · 2026-03-19T20:35:23Z

run benchmark arrow_reader_clickbench

adriangbot · 2026-03-19T20:38:16Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4093137898-468-pw7st 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing pr/snappy-zero-fill (eaa3ae4) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-19T21:04:47Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   pr_snappy-zero-fill
-----                                             ----                                   -------------------
arrow_reader_clickbench/async/Q1                  1.01   1093.5±5.62µs        ? ?/sec    1.00   1087.7±3.63µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.01      6.7±0.05ms        ? ?/sec    1.00      6.7±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.02      7.8±0.07ms        ? ?/sec    1.00      7.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.4±0.07ms        ? ?/sec    1.00     14.4±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.01     17.1±0.09ms        ? ?/sec    1.00     16.9±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.9±0.07ms        ? ?/sec    1.00     15.9±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.01      3.1±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     78.7±0.37ms        ? ?/sec    1.13     88.9±9.93ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.22     97.0±0.55ms        ? ?/sec    1.00     79.4±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.11    131.6±5.00ms        ? ?/sec    1.00    118.2±6.31ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.02    245.9±0.84ms        ? ?/sec    1.00    240.6±1.16ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.04     20.0±0.14ms        ? ?/sec    1.00     19.2±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.04     58.7±0.55ms        ? ?/sec    1.00     56.3±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.03     57.9±0.36ms        ? ?/sec    1.00     56.3±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.6±0.07ms        ? ?/sec    1.00     18.4±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.02     15.3±0.28ms        ? ?/sec    1.00     14.9±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.03ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.03     13.6±0.26ms        ? ?/sec    1.00     13.1±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.03     24.4±0.31ms        ? ?/sec    1.00     23.8±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.01      5.8±0.06ms        ? ?/sec    1.00      5.7±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.01      5.0±0.03ms        ? ?/sec    1.00      4.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1062.7±2.48µs        ? ?/sec    1.00   1067.4±2.72µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.02      6.6±0.06ms        ? ?/sec    1.00      6.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.01      7.6±0.06ms        ? ?/sec    1.00      7.5±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.01     14.3±0.08ms        ? ?/sec    1.00     14.2±0.08ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.02     17.1±0.24ms        ? ?/sec    1.00     16.8±0.15ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.9±0.11ms        ? ?/sec    1.00     15.8±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.01      3.0±0.03ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.03     72.2±0.64ms        ? ?/sec    1.00     70.0±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     80.8±0.54ms        ? ?/sec    1.00     78.5±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.04     99.1±0.77ms        ? ?/sec    1.00     95.4±0.26ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    213.3±0.80ms        ? ?/sec    1.12    238.7±1.23ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.01     19.4±0.14ms        ? ?/sec    1.00     19.2±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.2±0.63ms        ? ?/sec    1.00     55.4±0.27ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.03     56.9±0.45ms        ? ?/sec    1.00     55.5±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.3±0.08ms        ? ?/sec    1.00     18.3±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.01     14.5±0.23ms        ? ?/sec    1.00     14.4±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.3±0.03ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.01     12.8±0.20ms        ? ?/sec    1.00     12.6±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.02     23.3±0.28ms        ? ?/sec    1.00     22.7±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.01      5.5±0.04ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.01      4.8±0.02ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.02ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    868.7±1.80µs        ? ?/sec    1.01    873.1±1.90µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.2±0.04ms        ? ?/sec    1.00      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.1±0.04ms        ? ?/sec    1.00      6.1±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.02     22.1±0.67ms        ? ?/sec    1.00     21.6±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     28.7±0.88ms        ? ?/sec    1.05     30.2±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     23.1±0.12ms        ? ?/sec    1.19     27.4±0.48ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.04      2.8±0.03ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.03    125.7±0.35ms        ? ?/sec    1.00    122.0±0.34ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.03     99.3±0.19ms        ? ?/sec    1.00     96.4±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.01    145.7±0.50ms        ? ?/sec    1.00    144.6±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.01   282.2±14.62ms        ? ?/sec    1.00   280.6±16.88ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.02     27.4±0.13ms        ? ?/sec    1.00     26.9±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.05    109.9±0.24ms        ? ?/sec    1.00    104.6±0.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.04    105.7±0.18ms        ? ?/sec    1.00    101.9±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.02     18.9±0.08ms        ? ?/sec    1.00     18.5±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.02     22.3±0.13ms        ? ?/sec    1.00     21.9±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.01ms        ? ?/sec    1.00      6.9±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.02     11.5±0.08ms        ? ?/sec    1.00     11.2±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.03     21.1±0.12ms        ? ?/sec    1.00     20.5±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.2±0.02ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.02ms        ? ?/sec    1.00      5.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.01      4.4±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	784.1s
Peak memory	3.1 GiB
Avg memory	2.9 GiB
CPU user	707.4s
CPU sys	76.4s
Disk read	0 B
Disk write	758.4 MiB

branch

Metric	Value
Wall time	781.9s
Peak memory	3.2 GiB
Avg memory	3.1 GiB
CPU user	707.9s
CPU sys	74.1s
Disk read	0 B
Disk write	171.3 MiB

alamb

Very exciting @Dandandan

alamb · 2026-03-20T14:52:43Z

+            let n = self
+                .decoder
+                .decompress(input_buf, &mut spare_bytes[..len])
+                .map_err(|e| -> ParquetError { e.into() })?;


If this returns on error before setting len, will the buffer be left in an inconsistent state?

I think the use of the mut slice ensures that the call to decompress won't overwrite the newly allocated bytes.

However, this also basically passes in uninitialized bytes to decompress -- how do we know that the decompress doesn't read them? Maybe we should add a SAFETY warning to the decompress API that says it can't rely on initialized bytes 🤔

Effectively we rely on this:

https://docs.rs/snap/latest/snap/raw/struct.Decoder.html#errors

output has length less than decompress_len(input).

To not use unsafe we would need to have this feature:
BurntSushi/rust-snappy#62

Maybe we could improve the documentation around Decoder::decompress to mention it can receive non zero bytes and should not make any assumptions about their contents. I think that would be adequate

This does seem to be an appropriate use of spare_capacity_mut/set_len.

I spent some more time exploring this (and arguing with Codex about it)

The main issue is the Rust snappy implementation's contract takes an output buffer and doesn't say it can handle uninitialized bytes

That being said I can't think of how passing uninitialized bytes as an output location could cause problems (even if snappy changes how it internally works)

Yes, technically I think because snappy function is not marked unsafe, it breaks the contract (i.e. it might read the buffer). In practice it doesn't need to read anything.

A MaybeUninit API would solve that.

There is a PR now:
BurntSushi/rust-snappy#65

alamb · 2026-03-20T14:53:20Z

run benchmark arrow_reader

adriangbot · 2026-03-20T14:56:36Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4098640020-479-gzgft 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing pr/snappy-zero-fill (eaa3ae4) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

alamb · 2026-03-31T20:31:38Z

run benchmark arrow_reader

adriangbot · 2026-03-31T20:34:45Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4165328642-639-s2g89 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing pr/snappy-zero-fill (4149d2b) to 51bf8a4 (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

alamb

I am torn on this one. Let's see if we can get a measurable perf win and then I can hem and haw about it more

alamb · 2026-03-31T21:14:12Z

+            let n = self
+                .decoder
+                .decompress(input_buf, &mut spare_bytes[..len])
+                .map_err(|e| -> ParquetError { e.into() })?;


I spent some more time exploring this (and arguing with Codex about it)

The main issue is the Rust snappy implementation's contract takes an output buffer and doesn't say it can handle uninitialized bytes

That being said I can't think of how passing uninitialized bytes as an output location could cause problems (even if snappy changes how it internally works)

Dandandan · 2026-04-04T09:16:11Z

run benchmark arrow_reader

adriangbot · 2026-04-04T09:18:16Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4186819387-783-wl9px 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing pr/snappy-zero-fill (4149d2b) to 51bf8a4 (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

Avoid unnecessary buffer zero-fill in Snappy decompression

eaa3ae4

Write directly into spare capacity instead of resize+zero-fill, eliminating unnecessary memset for the decompression output buffer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added the parquet Changes to the parquet crate label Mar 19, 2026

Dandandan mentioned this pull request Mar 19, 2026

Fuse RLE decoding and view gathering for StringView dictionary #9586

Closed

alamb reviewed Mar 20, 2026

View reviewed changes

Merge branch 'main' into pr/snappy-zero-fill

4149d2b

alamb reviewed Mar 31, 2026

View reviewed changes

alamb mentioned this pull request Apr 7, 2026

feat: Add output: &mut [MaybeUninit<u8>] support BurntSushi/rust-snappy#65

Open

Conversation

Dandandan commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Dandandan commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

alamb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Mar 20, 2026

Uh oh!

adriangbot commented Mar 20, 2026

Uh oh!

alamb commented Mar 31, 2026

Uh oh!

adriangbot commented Mar 31, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Apr 4, 2026

Uh oh!

adriangbot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Dandandan commented Mar 19, 2026 •

edited

Loading

alamb left a comment •

edited

Loading