Fast integer compression for Rust — both a pure-Rust implementation and a wrapper around the C++ FastPFor library. Supports 32-bit (and for some codecs 64-bit) integers. Based on the Decoding billions of integers per second through vectorization, 2012 paper.
The Rust decoder is about 29% faster than the C++ version. The Rust implementation contains no unsafe code, and when built without the cpp feature this crate has #![forbid(unsafe_code)].
The simplest way is FastPFor256 — a composite codec that handles any input
length by compressing aligned 256-element blocks with FastPForBlock256 and encoding any
leftover values with VariableByte.
use fastpfor::{AnyLenCodec, FastPFor256};
let mut codec = FastPFor256::default();
let input: Vec<u32> = (0..1000).collect();
let mut encoded = Vec::new();
codec.encode(&input, &mut encoded).unwrap();
let mut decoded = Vec::new();
codec.decode(&encoded, &mut decoded, None).unwrap();
assert_eq!(decoded, input);For block-aligned inputs you can use the lower-level BlockCodec API:
use fastpfor::{BlockCodec, FastPForBlock256, slice_to_blocks};
let mut codec = FastPForBlock256::default();
let input: Vec<u32> = (0..512).collect(); // exactly 2 blocks of 256
let (blocks, remainder) = slice_to_blocks::<FastPForBlock256>(&input);
assert_eq!(blocks.len(), 2);
assert!(remainder.is_empty());
let mut encoded = Vec::new();
codec.encode_blocks(blocks, &mut encoded).unwrap();
let mut decoded = Vec::new();
codec.decode_blocks(&encoded, Some(u32::try_from(blocks.len() * 256).expect("block count fits in u32")), &mut decoded).unwrap();
assert_eq!(decoded, input);Enable the cpp feature in Cargo.toml:
fastpfor = { version = "0.1", features = ["cpp"] }All C++ codecs implement the same AnyLenCodec trait (encode / decode), so
the usage pattern is identical to the Rust examples above — just swap the codec type,
e.g. cpp::CppFastPFor128::new().
Thread safety: C++ codec instances have internal state and are not thread-safe. Create one instance per thread or synchronize access externally.
| Feature | Default | Description |
|---|---|---|
rust |
yes | Pure-Rust implementation — no unsafe, no build dependencies |
cpp |
no | C++ wrapper via CXX — requires a C++14 compiler with SIMD support |
cpp_portable |
no | Enables cpp, compiles C++ with SSE4.2 baseline (runs on any x86-64 from ~2008+) |
cpp_native |
no | Enables cpp, compiles C++ with -march=native for maximum throughput on the build machine |
The FASTPFOR_SIMD_MODE environment variable (portable or native) can override the SIMD mode at build time.
Recommendation: Use cpp_portable (not cpp_native) for distributable binaries.
Rust block codecs require block-aligned input. CompositeCodec chains a block codec with a tail codec (e.g. VariableByte) to handle arbitrary-length input. FastPFor256 and FastPFor128 are type aliases for such composites.
| Codec | Description |
|---|---|
FastPFor256 |
CompositeCodec of FastPForBlock256 + VariableByte |
FastPFor128 |
CompositeCodec of FastPForBlock128 + VariableByte |
VariableByte |
Variable-byte encoding, MSB is opposite to protobuf's varint |
JustCopy |
No compression; useful as a baseline |
FastPForBlock256 |
FastPFor with 256-element blocks; block-aligned input only |
FastPForBlock128 |
FastPFor with 128-element blocks; block-aligned input only |
All C++ codecs are composite (any-length) and implement AnyLenCodec only.
u64-capable codecs (CppFastPFor128, CppFastPFor256, CppVarInt) also implement BlockCodec64 with encode64 / decode64.
| Codec | Notes |
|---|---|
CppFastPFor128 |
FastPFor + VByte composite, 128-element blocks. Also supports u64. |
CppFastPFor256 |
FastPFor + VByte composite, 256-element blocks. Also supports u64. |
CppSimdFastPFor128 |
SIMD-optimized 128-element variant |
CppSimdFastPFor256 |
SIMD-optimized 256-element variant |
CppBP32 |
Binary packing, 32-bit blocks |
CppFastBinaryPacking8 |
Binary packing, 8-bit groups |
CppFastBinaryPacking16 |
Binary packing, 16-bit groups |
CppFastBinaryPacking32 |
Binary packing, 32-bit groups |
CppSimdBinaryPacking |
SIMD-optimized binary packing |
CppPFor |
Patched frame-of-reference |
CppSimplePFor |
Simplified PFor variant |
CppNewPFor |
PFor with improved exception handling |
CppOptPFor |
Optimized PFor |
CppPFor2008 |
Reference implementation from original paper |
CppSimdPFor |
SIMD PFor |
CppSimdSimplePFor |
SIMD SimplePFor |
CppSimdNewPFor |
SIMD NewPFor |
CppSimdOptPFor |
SIMD OptPFor |
CppSimple16 |
16 packing modes in 32-bit words |
CppSimple9 |
9 packing modes |
CppSimple9Rle |
Simple9 with run-length encoding |
CppSimple8b |
8 packing modes in 64-bit words |
CppSimple8bRle |
Simple8b with run-length encoding |
CppSimdGroupSimple |
SIMD group-simple encoding |
CppSimdGroupSimpleRingBuf |
SIMD group-simple with ring buffer |
CppVByte |
Standard variable-byte encoding |
CppMaskedVByte |
SIMD masked variable-byte |
CppStreamVByte |
SIMD stream variable-byte |
CppVarInt |
Standard varint. Also supports u64. |
CppVarIntGb |
Group varint |
CppCopy |
No compression (baseline) |
Using Linux x86-64 running just bench::cpp-vs-rust-decode native. The values below are time measurements; smaller values indicate faster decoding.
| name | cpp (ns) | rust (ns) | % faster |
|---|---|---|---|
clustered/1024 |
643.24 | 392.93 | 38.91% |
clustered/4096 |
1986 | 1414.8 | 28.76% |
sequential/1024 |
653.69 | 396.02 | 39.42% |
sequential/4096 |
2106 | 1476.2 | 29.91% |
sparse/1024 |
428.8 | 352.38 | 17.82% |
sparse/4096 |
1114 | 1179.5 | -5.88% |
uniform_large_value_distribution/1024 |
286.74 | 153.06 | 46.62% |
uniform_large_value_distribution/4096 |
748.19 | 558.05 | 25.41% |
uniform_small_value_distribution/1024 |
606.4 | 405.44 | 33.14% |
uniform_small_value_distribution/4096 |
2017.3 | 1403.7 | 30.42% |
Rust encoding has not yet been fully optimized or verified.
- Rust feature (
rust, the default): no additional dependencies. - C++ feature (
cpp): requires a C++14-capable compiler with SIMD intrinsics. See FastPFor C++ requirements.
The default GitHub Actions runner has all needed dependencies.
For local development:
# This list may be incomplete
sudo apt-get install build-essentiallibsimde-dev is optional. On ARM/aarch64, the C++ build fetches SIMDe via CMake
and the CXX bridge reuses that include path automatically.
On Apple Silicon, SIMDe installation is usually not required — the C++ build fetches it via CMake.
If you prefer a Homebrew fallback:
brew install simde
export CXXFLAGS="-I/opt/homebrew/include"
export CFLAGS="-I/opt/homebrew/include"This project uses just as a task runner:
cargo install just # install once
just # list available commands
just test # run all testsLicensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.