Skip to content

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Aug 25, 2025

Using the FSST library currently requires you to make two allocations:

  1. Allocate a Vec to compress_into
  2. Copy that buffer to some target buffer

This is kind of annoying. What I really want to do is pre-allocate a buffer large enough to hold all of the compressed data, then do something like

let mut buffer = Vec::with_capacity(...);

let mut ptr = 0;

for string in strings {
    let written = compressor.compress_into(string, buffer.spare_capacity_mut()[ptr..]);
    ptr += written;
}

This lets me compress a bunch of values directly into a single packed byte buffer, without an intermediate copy.

The Fix

We shouldn't take a &mut Vec<u8> directly, instead we should take &mut [MaybeUninit<u8>], which can be backed by Vec, Bytes, Buffer or whatever other memory allocation we happen to get our hands on.

This PR adds a new safe compression pathway that exposes compress_into_uninit and implements the hot loop using only safe code to boot.

I'm leaving the old unsafe compress_into here to allow existing projects to keep using that interface, but have updated the docs to indicate that the compress_into_uninit is the new preferred pathway.

Performance

Performance measured on M4 Max on the micro and compress benches seems to be more or less identical.

I have a long term goal of eliminating most of the unsafe code in this repo, see #87. This brings us one step closer to this.

Using the FSST library currently requires you to make two allocations

1. Allocate a buffer to compress_into
2. Allocate a larger packed buffer

We shouldn't take a &mut Vec<u8> directly, since Vec isn't
splittable. We should instead be taking &mut [MaybeUninit<u8>], which
can be backed by Vec, Bytes, Buffer<u8> or whatever other memory
allocation we happen to get our hands on.

This PR adds a new safe compression pathway that exposes
`compress_into_uninit` and implements the hot loop using only safe code
to boot.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y
Copy link
Contributor Author

a10y commented Aug 25, 2025

Local benchmark run on this branch (M4 Max)

aduffy@Andrews-MacBook-Pro /V/C/fsst (aduffy/safe-compress)> cargo bench --bench compress
   Compiling fsst-rs v0.5.3 (/Volumes/Code/fsst)
    Finished `bench` profile [optimized] target(s) in 0.60s
     Running benches/compress.rs (target/release/deps/compress-1ee906f2016f72c9)
Gnuplot not found, using plotters backend
dbtext/wikipedia/train-and-compress
                        time:   [8.1022 ms 8.1163 ms 8.1311 ms]
                        change: [-1.5404% -1.2911% -1.0334%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
dbtext/wikipedia/compress-only
                        time:   [7.4230 ms 7.4498 ms 7.4850 ms]
                        thrpt:  [364.76 MiB/s 366.48 MiB/s 367.80 MiB/s]
                 change:
                        time:   [-4.2437% -3.8784% -3.4581%] (p = 0.00 < 0.05)
                        thrpt:  [+3.5819% +4.0348% +4.4318%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
dbtext/wikipedia/decompress
                        time:   [900.59 µs 904.12 µs 907.38 µs]
                        thrpt:  [2.9384 GiB/s 2.9490 GiB/s 2.9605 GiB/s]
                 change:
                        time:   [+1.8525% +2.8612% +3.8656%] (p = 0.00 < 0.05)
                        thrpt:  [-3.7218% -2.7816% -1.8188%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

compressed dbtext/wikipedia 2862830 => 1640581B (compression factor 1.75:1)
dbtext/l_comment/train-and-compress
                        time:   [6.1031 ms 6.1369 ms 6.1756 ms]
                        change: [-2.3402% -1.6404% -0.7872%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
dbtext/l_comment/compress-only
                        time:   [5.4490 ms 5.4563 ms 5.4637 ms]
                        thrpt:  [479.30 MiB/s 479.94 MiB/s 480.59 MiB/s]
                 change:
                        time:   [-3.8527% -3.6184% -3.3808%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4991% +3.7542% +4.0071%]
                        Performance has improved.
dbtext/l_comment/decompress
                        time:   [395.33 µs 398.69 µs 402.28 µs]
                        thrpt:  [6.3572 GiB/s 6.4144 GiB/s 6.4689 GiB/s]
                 change:
                        time:   [+2.7463% +4.6504% +6.6160%] (p = 0.00 < 0.05)
                        thrpt:  [-6.2055% -4.4437% -2.6729%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

compressed dbtext/l_comment 2745949 => 1018169B (compression factor 2.70:1)
dbtext/urls/train-and-compress
                        time:   [11.059 ms 11.080 ms 11.102 ms]
                        change: [-1.7492% -1.0009% -0.3482%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
dbtext/urls/compress-only
                        time:   [10.320 ms 10.350 ms 10.383 ms]
                        thrpt:  [581.23 MiB/s 583.04 MiB/s 584.77 MiB/s]
                 change:
                        time:   [-0.2435% +0.1008% +0.4676%] (p = 0.60 > 0.05)
                        thrpt:  [-0.4655% -0.1007% +0.2441%]
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking dbtext/urls/decompress: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
dbtext/urls/decompress  time:   [1.0093 ms 1.0159 ms 1.0238 ms]
                        thrpt:  [5.7565 GiB/s 5.8008 GiB/s 5.8389 GiB/s]
                 change:
                        time:   [-1.5426% -0.4588% +0.6629%] (p = 0.41 > 0.05)
                        thrpt:  [-0.6585% +0.4609% +1.5667%]
                        No change in performance detected.

compressed dbtext/urls 6327875 => 2856682B (compression factor 2.22:1)

@a10y
Copy link
Contributor Author

a10y commented Aug 25, 2025

Local benchmark run on develop (M4 Max)

aduffy@Andrews-MacBook-Pro /V/C/fsst_original (develop)> cargo bench --bench compress
    Finished `bench` profile [optimized] target(s) in 0.03s
     Running benches/compress.rs (target/release/deps/compress-1ee906f2016f72c9)
Gnuplot not found, using plotters backend
dbtext/wikipedia/train-and-compress
                        time:   [8.7453 ms 8.7678 ms 8.7905 ms]
                        change: [-2.1238% -1.8275% -1.5178%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
dbtext/wikipedia/compress-only
                        time:   [8.0861 ms 8.1511 ms 8.2198 ms]
                        thrpt:  [332.15 MiB/s 334.95 MiB/s 337.64 MiB/s]
                 change:
                        time:   [-0.0001% +0.8594% +1.7039%] (p = 0.06 > 0.05)
                        thrpt:  [-1.6754% -0.8521% +0.0001%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
dbtext/wikipedia/decompress
                        time:   [895.29 µs 901.23 µs 906.37 µs]
                        thrpt:  [2.9416 GiB/s 2.9584 GiB/s 2.9781 GiB/s]
                 change:
                        time:   [-0.5790% +0.3510% +1.2569%] (p = 0.46 > 0.05)
                        thrpt:  [-1.2413% -0.3497% +0.5824%]
                        No change in performance detected.

compressed dbtext/wikipedia 2862830 => 1640581B (compression factor 1.75:1)
dbtext/l_comment/train-and-compress
                        time:   [6.3088 ms 6.3298 ms 6.3530 ms]
                        change: [-0.8264% -0.4224% +0.0215%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
dbtext/l_comment/compress-only
                        time:   [5.6181 ms 5.6270 ms 5.6365 ms]
                        thrpt:  [464.61 MiB/s 465.39 MiB/s 466.12 MiB/s]
                 change:
                        time:   [-1.5536% -1.2958% -1.0225%] (p = 0.00 < 0.05)
                        thrpt:  [+1.0330% +1.3128% +1.5781%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
dbtext/l_comment/decompress
                        time:   [380.76 µs 383.70 µs 386.40 µs]
                        thrpt:  [6.6184 GiB/s 6.6650 GiB/s 6.7165 GiB/s]
                 change:
                        time:   [-6.0954% -4.8400% -3.6035%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7382% +5.0862% +6.4910%]
                        Performance has improved.

compressed dbtext/l_comment 2745949 => 1018169B (compression factor 2.70:1)
dbtext/urls/train-and-compress
                        time:   [10.948 ms 10.977 ms 11.013 ms]
                        change: [-4.3787% -3.3474% -2.3954%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
dbtext/urls/compress-only
                        time:   [10.259 ms 10.279 ms 10.303 ms]
                        thrpt:  [585.75 MiB/s 587.10 MiB/s 588.26 MiB/s]
                 change:
                        time:   [-4.0140% -3.3222% -2.6764%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7500% +3.4363% +4.1819%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking dbtext/urls/decompress: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
dbtext/urls/decompress  time:   [1.0173 ms 1.0233 ms 1.0302 ms]
                        thrpt:  [5.7205 GiB/s 5.7592 GiB/s 5.7932 GiB/s]
                 change:
                        time:   [-5.4298% -3.0801% -0.9196%] (p = 0.01 < 0.05)
                        thrpt:  [+0.9281% +3.1780% +5.7416%]
                        Change within noise threshold.

compressed dbtext/urls 6327875 => 2856682B (compression factor 2.22:1)

@a10y a10y requested a review from Copilot August 25, 2025 18:30
@codspeed-hq
Copy link

codspeed-hq bot commented Aug 25, 2025

CodSpeed Performance Report

Merging #123 will degrade performances by 16.92%

Comparing aduffy/safe-compress (5f10dd4) with develop (d0a3601)

Summary

❌ 10 regressions
✅ 6 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
compress-only 13.1 ms 15.5 ms -15.2%
train-and-compress 15.7 ms 18.1 ms -13.03%
compress-only 34.2 ms 40.4 ms -15.42%
train-and-compress 36.7 ms 42.9 ms -14.5%
compress-only 19.2 ms 22.8 ms -15.66%
train-and-compress 22.4 ms 26 ms -13.74%
compress 11.4 ms 13.8 ms -16.92%
compress 6.8 ms 8.1 ms -15.33%
compress 4.1 ms 4.7 ms -13.8%
compress 2.3 ms 2.6 ms -12.54%

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new safe compression pathway that avoids intermediate allocations by working directly with uninitialized memory. The change allows users to pre-allocate a buffer and compress data directly into it without requiring two separate allocations and a copy operation.

  • Adds compress_into_uninit method that takes &mut [MaybeUninit<u8>] instead of &mut Vec<u8>
  • Implements a safe version of the compression hot loop using compress_word_safe
  • Updates benchmarks to use the new safe API while maintaining performance

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/lib.rs Adds new safe compression methods and updates existing compress method to use the new pathway
benches/micro.rs Updates micro benchmarks to use the new safe compression API
benches/compress.rs Updates compression benchmarks to use the new safe API

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@a10y
Copy link
Contributor Author

a10y commented Aug 25, 2025

Wow codspeed did not like that

a10y added 4 commits August 25, 2025 15:09
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
This reverts commit 01ddd72.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@spiraldb spiraldb deleted a comment from Copilot AI Aug 25, 2025
@spiraldb spiraldb deleted a comment from Copilot AI Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants