chore: consolidate Rust and C++ API by nyurik · Pull Request #73 · fast-pack/FastPFOR-rs

nyurik · 2026-03-19T11:25:45Z

split all API into cleanly defined "block" api and "variable length" api. The API forces the input to be blocks of constant size, at compile time, and for the codec to actually match that.
Made code shorter by a thousand lines
Identical C++ and Rust usage
Introduce a new Composite codec that requires a generic BlockCodec and AnyLenCodec

pub trait BlockCodec {
    /// The fixed-size block type.  Must be plain-old-data (`Pod`).
    /// In practice this will be `[u32; 128]` or `[u32; 256]`.
    type Block: Pod;

    /// Compress a slice of complete, fixed-size blocks.
    ///
    /// No remainder is possible — the caller must split the input first using
    /// [`slice_to_blocks`] and handle any remainder separately.
    fn encode_blocks(
        &mut self,
        blocks: &[Self::Block],
        out: &mut Vec<u32>,
    ) -> Result<(), FastPForError>;

    /// Decompress exactly `n_blocks` blocks from `input`.
    fn decode_blocks(
        &mut self,
        input: &[u32],
        n_blocks: usize,
        out: &mut Vec<u32>,
    ) -> Result<(), FastPForError>;
}

/// Compresses and decompresses an arbitrary-length `&[u32]` slice.
pub trait AnyLenCodec {
    /// Compress an arbitrary-length slice of `u32` values.
    fn encode(&mut self, input: &[u32], out: &mut Vec<u32>) -> Result<(), FastPForError>;

    /// Decompress a previously compressed slice of `u32` values.
    fn decode(&mut self, input: &[u32], out: &mut Vec<u32>) -> Result<(), FastPForError>;
}

/// Split a flat `&[u32]` into `(&[Blocks::Block], &[u32])` without copying.
///
/// # Example
///
/// ```rust,ignore
/// let data: Vec<u32> = (0..600).collect(); // 2 × 256 + 88 remainder
/// let (blocks, remainder) = slice_to_blocks::<FastPFor256>(&data);
/// assert_eq!(blocks.len(), 2);    // 2 blocks of [u32; 256]
/// assert_eq!(remainder.len(), 88);
/// ```
#[must_use]
pub fn slice_to_blocks<Blocks: BlockCodec>(input: &[u32]) -> (&[Blocks::Block], &[u32]) { ... }

/// Combines a block-oriented codec with an arbitrary-length tail codec.
///
/// `CompositeCodec<Blocks, Tail>` implements [`AnyLenCodec`]: it accepts any
/// input length, encodes the aligned prefix with `Blocks`, and the
/// sub-block remainder with `Tail`.
///
/// # Wire format
///
/// ```text
/// [ n_blocks: u32 ] [ Blocks encoded data... ] [ Tail encoded data... ]
/// ```
///
/// # Example
///
/// ```rust,ignore
/// use fastpfor::{AnyLenCodec, FastPFor256WithVByte};
///
/// let data: Vec<u32> = (0..600).collect(); // 2 × 256 + 88 remainder
/// let codec = FastPFor256WithVByte::default();
///
/// let mut encoded = Vec::new();
/// codec.encode(&data, &mut encoded).unwrap();
///
/// let mut decoded = Vec::new();
/// codec.decode(&encoded, &mut decoded).unwrap();
/// assert_eq!(decoded, data);
/// ```
pub struct CompositeCodec<Blocks: BlockCodec, Tail: AnyLenCodec> {...}

codecov · 2026-03-19T11:27:06Z

Codecov Report

❌ Patch coverage is 76.95783% with 153 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/test_utils.rs	34.71%	123 Missing and 3 partials ⚠️
src/rust/integer_compression/fastpfor.rs	91.19%	13 Missing and 4 partials ⚠️
src/rust/integer_compression/variable_byte.rs	96.29%	2 Missing and 3 partials ⚠️
src/rust/composite.rs	96.84%	0 Missing and 3 partials ⚠️
src/cpp/wrappers.rs	88.88%	0 Missing and 1 partial ⚠️
src/rust/integer_compression/just_copy.rs	94.11%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

CommanderStorm

LGTM

Performance is also the same

.github/workflows/ci.yml

First step of moving towards unified C++ and Rust model of codecs done in #73

This reverts commit 6b1ef41.

Copilot

Pull request overview

Copilot reviewed 47 out of 47 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T19:33:10Z

justfile

    for dir in "./" "fuzz"; do
-        pushd "$dir"
+        cd "$dir"
        if (rustup toolchain list | grep nightly && rustup component list --toolchain nightly | grep rustfmt) &> /dev/null; then
            echo "Reformatting Rust code using nightly Rust fmt to sort imports in $dir"
            cargo +nightly fmt --all -- --config imports_granularity=Module,group_imports=StdExternalCrate
        else
            echo "Reformatting Rust with the stable cargo fmt in $dir.  Install nightly with \`rustup install nightly\` for better results"
            cargo fmt --all
        fi
-        popd
+        if [ -f .git ]; then
+            cd ..
+        fi
    done


The directory handling in this loop looks incorrect: cd "$dir" is not paired with a reliable cd back to the original directory, and the .git check uses -f even though .git is typically a directory. Consider using pushd/popd, saving root=$(pwd) and cd "$root" each iteration, or checking [ -d .git ] if you want to detect the repo root.

Copilot · 2026-03-23T19:33:10Z

fuzz/justfile


-# Run rust_decompress_oracle (uses C++ as oracle)
-rust-decompress *args: (run 'rust_decompress_oracle' args)
+# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decodeed values)


Spelling typo in comment: “decodeed values” → “decoded values”.

Suggested change

# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decodeed values)

# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decoded values)

Copilot · 2026-03-23T19:33:11Z

fuzz/fuzz_targets/decode_arbitrary.rs

+//!
+//! Why this target is needed
+//! -------------------------
+//! The existing `compress_oracle` target only feeds *well-formed* data to the Rust


The docs here refer to an existing compress_oracle target, but this PR renames/introduces encode_oracle as the Rust self-roundtrip target. Updating the target name in this comment would keep the fuzz target documentation accurate.

Suggested change

//! The existing `compress_oracle` target only feeds *well-formed* data to the Rust

//! The existing `encode_oracle` target only feeds *well-formed* data to the Rust

Copilot · 2026-03-23T19:33:11Z

src/test_utils.rs

+//! - **Library unit tests:** `crate::test_utils` via `#[cfg(test)] mod bench_utils` in `lib.rs`
+//!   and `extern crate self as fastpfor` so this file can `use fastpfor::...`.
+//! - **Integration tests:** `#[cfg(test)] #[path = "../src/test_utils.rs"] mod bench_utils`.
+//! - **Criterion benchmarks:** `#[path = "../src/test_utils.rs"] mod bench_utils` (`cfg(test)` is not
+//!   enabled for bench targets, so the module is included unconditionally there).


The module name in these docs is inconsistent with how this file is actually included. The consumers in this PR use mod test_utils (and lib.rs declares mod test_utils), but the bullets here say mod bench_utils, which is misleading for contributors.

Copilot · 2026-03-23T19:33:11Z

src/rust/integer_compression/fastpfor.rs

+/// Default page size in number of integers (64 KiB / 4 bytes = 16 Ki integers).
+const DEFAULT_PAGE_SIZE: u32 = 65536;


This comment appears incorrect: DEFAULT_PAGE_SIZE is expressed in number of integers (u32s), so 65,536 corresponds to 256 KiB of u32 data, not “64 KiB / 4 bytes = 16 Ki integers”. Consider rewording to avoid mixing bytes and element counts.

Copilot · 2026-03-23T19:33:11Z

tests/basic_tests.rs

+/// Sub-block-sized inputs produce no output via `BlockCodec`.
 #[test]
-fn verify_with_exceptions() {
-    const N: usize = 32;
-    const TIMES: usize = 1000;
-    let mut rng = rand::rng();
-
-    let mut data = vec![0u32; N];
-    let mut compressed = vec![0u32; N];
-    let mut uncompressed = vec![0u32; N];
-
-    for bit in 0..31 {
-        for _ in 0..TIMES {
-            for value in &mut data {
-                *value = rng.random();
-            }
-
-            fast_pack(&data, 0, &mut compressed, 0, bit);
-            fast_unpack(&compressed, 0, &mut uncompressed, 0, bit);
-
-            mask_array(&mut data, (1 << bit) - 1);
-
-            assert_eq!(
-                data, uncompressed,
-                "Data does not match uncompressed output"
-            );
-        }
+fn spurious_out_test() {
+    fn check<C: BlockCodec + Default>(len: usize) {
+        let x = vec![0u32; 1024];
+        let (blocks, _) = slice_to_blocks::<C>(&x[..len]);
+        let out = block_compress::<C>(cast_slice(blocks)).unwrap();
+        assert!(out.is_empty() || blocks.is_empty());


The comment says sub-block-sized inputs produce no output via BlockCodec, but the current block format always emits at least the length header (e.g. [0] for zero blocks). Consider updating the comment to match the actual behavior being tested (zero aligned blocks rather than “no output”).

nyurik requested review from CommanderStorm and Copilot and removed request for Copilot March 19, 2026 11:25

Copilot AI review requested due to automatic review settings March 19, 2026 11:34

This comment was marked as outdated.

Sign in to view

CommanderStorm approved these changes Mar 20, 2026

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

nyurik mentioned this pull request Mar 22, 2026

chore(cpp): reorganize codec structure and improve error handling #75

Merged

nyurik added a commit that referenced this pull request Mar 22, 2026

chore(cpp): reorganize codec structure and improve error handling (#75)

6ba6e0b

First step of moving towards unified C++ and Rust model of codecs done in #73

nyurik added 3 commits March 22, 2026 16:58

reset to main with changes

f315820

results

48062b9

Merge branch 'main' into block-size

1ba2155

nyurik force-pushed the block-size branch from 893a3aa to 1ba2155 Compare March 22, 2026 21:46

nyurik added 4 commits March 22, 2026 17:49

fix docs

fb439e6

cleanup

80dd094

wip

ae89f23

decode failures

cc5ca02

nyurik requested a review from Copilot March 23, 2026 00:19

Copilot started reviewing on behalf of nyurik March 23, 2026 00:19 View session

This comment was marked as outdated.

Sign in to view

nyurik added 10 commits March 22, 2026 21:13

broken

dafe742

simplify tests

087f1fc

fixes

2cabce1

cleanup

18e8fc8

roundtrip

b15a783

cleanup

bbea95e

cleanup

29b4a03

cleanup

ee9568b

cleanup

f8733d2

cleanup

a3b3f16

nyurik force-pushed the block-size branch from 2e5372d to a3b3f16 Compare March 23, 2026 05:24

nyurik added 7 commits March 23, 2026 01:31

revert cpp testing

6b1ef41

Revert "revert cpp testing"

d01b333

This reverts commit 6b1ef41.

cleanup

6952399

fix handling short values

b7c2314

simplify cpp tests

b4aceb9

asserts

de430c7

clean up rnd seed

7ec91db

nyurik requested a review from Copilot March 23, 2026 19:28

Copilot started reviewing on behalf of nyurik March 23, 2026 19:29 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

nyurik added 2 commits March 23, 2026 15:38

consolidate test helpers

3aef062

feedback

db79d66

nyurik force-pushed the block-size branch from a42d9b0 to db79d66 Compare March 23, 2026 19:46

lock down block size

fbdf500

nyurik changed the title ~~chore: massive rework of the API into blocks~~ chore: consolidate Rust and C++ API Mar 23, 2026

nyurik merged commit a2e3f0b into fast-pack:main Mar 23, 2026
15 checks passed

nyurik deleted the block-size branch March 23, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: consolidate Rust and C++ API#73

chore: consolidate Rust and C++ API#73
nyurik merged 27 commits intofast-pack:mainfrom
nyurik:block-size

nyurik commented Mar 19, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

CommanderStorm left a comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decodeed values)
	# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decoded values)

	//! The existing `compress_oracle` target only feeds well-formed data to the Rust
	//! The existing `encode_oracle` target only feeds well-formed data to the Rust

		/// Default page size in number of integers (64 KiB / 4 bytes = 16 Ki integers).
		const DEFAULT_PAGE_SIZE: u32 = 65536;

Conversation

nyurik commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

CommanderStorm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nyurik commented Mar 19, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading