Skip to content

chore: consolidate Rust and C++ API#73

Merged
nyurik merged 27 commits intofast-pack:mainfrom
nyurik:block-size
Mar 23, 2026
Merged

chore: consolidate Rust and C++ API#73
nyurik merged 27 commits intofast-pack:mainfrom
nyurik:block-size

Conversation

@nyurik
Copy link
Copy Markdown
Member

@nyurik nyurik commented Mar 19, 2026

  • split all API into cleanly defined "block" api and "variable length" api. The API forces the input to be blocks of constant size, at compile time, and for the codec to actually match that.
  • Made code shorter by a thousand lines
  • Identical C++ and Rust usage
  • Introduce a new Composite codec that requires a generic BlockCodec and AnyLenCodec
pub trait BlockCodec {
    /// The fixed-size block type.  Must be plain-old-data (`Pod`).
    /// In practice this will be `[u32; 128]` or `[u32; 256]`.
    type Block: Pod;

    /// Compress a slice of complete, fixed-size blocks.
    ///
    /// No remainder is possible — the caller must split the input first using
    /// [`slice_to_blocks`] and handle any remainder separately.
    fn encode_blocks(
        &mut self,
        blocks: &[Self::Block],
        out: &mut Vec<u32>,
    ) -> Result<(), FastPForError>;

    /// Decompress exactly `n_blocks` blocks from `input`.
    fn decode_blocks(
        &mut self,
        input: &[u32],
        n_blocks: usize,
        out: &mut Vec<u32>,
    ) -> Result<(), FastPForError>;
}

/// Compresses and decompresses an arbitrary-length `&[u32]` slice.
pub trait AnyLenCodec {
    /// Compress an arbitrary-length slice of `u32` values.
    fn encode(&mut self, input: &[u32], out: &mut Vec<u32>) -> Result<(), FastPForError>;

    /// Decompress a previously compressed slice of `u32` values.
    fn decode(&mut self, input: &[u32], out: &mut Vec<u32>) -> Result<(), FastPForError>;
}

/// Split a flat `&[u32]` into `(&[Blocks::Block], &[u32])` without copying.
///
/// # Example
///
/// ```rust,ignore
/// let data: Vec<u32> = (0..600).collect(); // 2 × 256 + 88 remainder
/// let (blocks, remainder) = slice_to_blocks::<FastPFor256>(&data);
/// assert_eq!(blocks.len(), 2);    // 2 blocks of [u32; 256]
/// assert_eq!(remainder.len(), 88);
/// ```
#[must_use]
pub fn slice_to_blocks<Blocks: BlockCodec>(input: &[u32]) -> (&[Blocks::Block], &[u32]) { ... }

/// Combines a block-oriented codec with an arbitrary-length tail codec.
///
/// `CompositeCodec<Blocks, Tail>` implements [`AnyLenCodec`]: it accepts any
/// input length, encodes the aligned prefix with `Blocks`, and the
/// sub-block remainder with `Tail`.
///
/// # Wire format
///
/// ```text
/// [ n_blocks: u32 ] [ Blocks encoded data... ] [ Tail encoded data... ]
/// ```
///
/// # Example
///
/// ```rust,ignore
/// use fastpfor::{AnyLenCodec, FastPFor256WithVByte};
///
/// let data: Vec<u32> = (0..600).collect(); // 2 × 256 + 88 remainder
/// let codec = FastPFor256WithVByte::default();
///
/// let mut encoded = Vec::new();
/// codec.encode(&data, &mut encoded).unwrap();
///
/// let mut decoded = Vec::new();
/// codec.decode(&encoded, &mut decoded).unwrap();
/// assert_eq!(decoded, data);
/// ```
pub struct CompositeCodec<Blocks: BlockCodec, Tail: AnyLenCodec> {...}

@nyurik nyurik requested review from CommanderStorm and Copilot and removed request for Copilot March 19, 2026 11:25
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Copilot AI review requested due to automatic review settings March 19, 2026 11:34

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@CommanderStorm CommanderStorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Performance is also the same

nyurik added a commit that referenced this pull request Mar 22, 2026
First step of moving towards unified C++ and Rust model of codecs done
in #73

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 47 out of 47 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 87 to 99
for dir in "./" "fuzz"; do
pushd "$dir"
cd "$dir"
if (rustup toolchain list | grep nightly && rustup component list --toolchain nightly | grep rustfmt) &> /dev/null; then
echo "Reformatting Rust code using nightly Rust fmt to sort imports in $dir"
cargo +nightly fmt --all -- --config imports_granularity=Module,group_imports=StdExternalCrate
else
echo "Reformatting Rust with the stable cargo fmt in $dir. Install nightly with \`rustup install nightly\` for better results"
cargo fmt --all
fi
popd
if [ -f .git ]; then
cd ..
fi
done
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory handling in this loop looks incorrect: cd "$dir" is not paired with a reliable cd back to the original directory, and the .git check uses -f even though .git is typically a directory. Consider using pushd/popd, saving root=$(pwd) and cd "$root" each iteration, or checking [ -d .git ] if you want to detect the repo root.

Copilot uses AI. Check for mistakes.
fuzz/justfile Outdated

# Run rust_decompress_oracle (uses C++ as oracle)
rust-decompress *args: (run 'rust_decompress_oracle' args)
# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decodeed values)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling typo in comment: “decodeed values” → “decoded values”.

Suggested change
# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decodeed values)
# Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decoded values)

Copilot uses AI. Check for mistakes.
//!
//! Why this target is needed
//! -------------------------
//! The existing `compress_oracle` target only feeds *well-formed* data to the Rust
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs here refer to an existing compress_oracle target, but this PR renames/introduces encode_oracle as the Rust self-roundtrip target. Updating the target name in this comment would keep the fuzz target documentation accurate.

Suggested change
//! The existing `compress_oracle` target only feeds *well-formed* data to the Rust
//! The existing `encode_oracle` target only feeds *well-formed* data to the Rust

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +9
//! - **Library unit tests:** `crate::test_utils` via `#[cfg(test)] mod bench_utils` in `lib.rs`
//! and `extern crate self as fastpfor` so this file can `use fastpfor::...`.
//! - **Integration tests:** `#[cfg(test)] #[path = "../src/test_utils.rs"] mod bench_utils`.
//! - **Criterion benchmarks:** `#[path = "../src/test_utils.rs"] mod bench_utils` (`cfg(test)` is not
//! enabled for bench targets, so the module is included unconditionally there).
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module name in these docs is inconsistent with how this file is actually included. The consumers in this PR use mod test_utils (and lib.rs declares mod test_utils), but the bullets here say mod bench_utils, which is misleading for contributors.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +17
/// Default page size in number of integers (64 KiB / 4 bytes = 16 Ki integers).
const DEFAULT_PAGE_SIZE: u32 = 65536;
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears incorrect: DEFAULT_PAGE_SIZE is expressed in number of integers (u32s), so 65,536 corresponds to 256 KiB of u32 data, not “64 KiB / 4 bytes = 16 Ki integers”. Consider rewording to avoid mixing bytes and element counts.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +32
/// Sub-block-sized inputs produce no output via `BlockCodec`.
#[test]
fn verify_with_exceptions() {
const N: usize = 32;
const TIMES: usize = 1000;
let mut rng = rand::rng();

let mut data = vec![0u32; N];
let mut compressed = vec![0u32; N];
let mut uncompressed = vec![0u32; N];

for bit in 0..31 {
for _ in 0..TIMES {
for value in &mut data {
*value = rng.random();
}

fast_pack(&data, 0, &mut compressed, 0, bit);
fast_unpack(&compressed, 0, &mut uncompressed, 0, bit);

mask_array(&mut data, (1 << bit) - 1);

assert_eq!(
data, uncompressed,
"Data does not match uncompressed output"
);
}
fn spurious_out_test() {
fn check<C: BlockCodec + Default>(len: usize) {
let x = vec![0u32; 1024];
let (blocks, _) = slice_to_blocks::<C>(&x[..len]);
let out = block_compress::<C>(cast_slice(blocks)).unwrap();
assert!(out.is_empty() || blocks.is_empty());
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says sub-block-sized inputs produce no output via BlockCodec, but the current block format always emits at least the length header (e.g. [0] for zero blocks). Consider updating the comment to match the actual behavior being tested (zero aligned blocks rather than “no output”).

Copilot uses AI. Check for mistakes.
@nyurik nyurik changed the title chore: massive rework of the API into blocks chore: consolidate Rust and C++ API Mar 23, 2026
@nyurik nyurik merged commit a2e3f0b into fast-pack:main Mar 23, 2026
15 checks passed
@nyurik nyurik deleted the block-size branch March 23, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants