chore: consolidate Rust and C++ API#73
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
CommanderStorm
left a comment
There was a problem hiding this comment.
LGTM
Performance is also the same
First step of moving towards unified C++ and Rust model of codecs done in #73
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 47 out of 47 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for dir in "./" "fuzz"; do | ||
| pushd "$dir" | ||
| cd "$dir" | ||
| if (rustup toolchain list | grep nightly && rustup component list --toolchain nightly | grep rustfmt) &> /dev/null; then | ||
| echo "Reformatting Rust code using nightly Rust fmt to sort imports in $dir" | ||
| cargo +nightly fmt --all -- --config imports_granularity=Module,group_imports=StdExternalCrate | ||
| else | ||
| echo "Reformatting Rust with the stable cargo fmt in $dir. Install nightly with \`rustup install nightly\` for better results" | ||
| cargo fmt --all | ||
| fi | ||
| popd | ||
| if [ -f .git ]; then | ||
| cd .. | ||
| fi | ||
| done |
There was a problem hiding this comment.
The directory handling in this loop looks incorrect: cd "$dir" is not paired with a reliable cd back to the original directory, and the .git check uses -f even though .git is typically a directory. Consider using pushd/popd, saving root=$(pwd) and cd "$root" each iteration, or checking [ -d .git ] if you want to detect the repo root.
fuzz/justfile
Outdated
|
|
||
| # Run rust_decompress_oracle (uses C++ as oracle) | ||
| rust-decompress *args: (run 'rust_decompress_oracle' args) | ||
| # Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decodeed values) |
There was a problem hiding this comment.
Spelling typo in comment: “decodeed values” → “decoded values”.
| # Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decodeed values) | |
| # Run decode_oracle (parallel Rust + C++ roundtrips, cross-checks decoded values) |
| //! | ||
| //! Why this target is needed | ||
| //! ------------------------- | ||
| //! The existing `compress_oracle` target only feeds *well-formed* data to the Rust |
There was a problem hiding this comment.
The docs here refer to an existing compress_oracle target, but this PR renames/introduces encode_oracle as the Rust self-roundtrip target. Updating the target name in this comment would keep the fuzz target documentation accurate.
| //! The existing `compress_oracle` target only feeds *well-formed* data to the Rust | |
| //! The existing `encode_oracle` target only feeds *well-formed* data to the Rust |
src/test_utils.rs
Outdated
| //! - **Library unit tests:** `crate::test_utils` via `#[cfg(test)] mod bench_utils` in `lib.rs` | ||
| //! and `extern crate self as fastpfor` so this file can `use fastpfor::...`. | ||
| //! - **Integration tests:** `#[cfg(test)] #[path = "../src/test_utils.rs"] mod bench_utils`. | ||
| //! - **Criterion benchmarks:** `#[path = "../src/test_utils.rs"] mod bench_utils` (`cfg(test)` is not | ||
| //! enabled for bench targets, so the module is included unconditionally there). |
There was a problem hiding this comment.
The module name in these docs is inconsistent with how this file is actually included. The consumers in this PR use mod test_utils (and lib.rs declares mod test_utils), but the bullets here say mod bench_utils, which is misleading for contributors.
| /// Default page size in number of integers (64 KiB / 4 bytes = 16 Ki integers). | ||
| const DEFAULT_PAGE_SIZE: u32 = 65536; |
There was a problem hiding this comment.
This comment appears incorrect: DEFAULT_PAGE_SIZE is expressed in number of integers (u32s), so 65,536 corresponds to 256 KiB of u32 data, not “64 KiB / 4 bytes = 16 Ki integers”. Consider rewording to avoid mixing bytes and element counts.
tests/basic_tests.rs
Outdated
| /// Sub-block-sized inputs produce no output via `BlockCodec`. | ||
| #[test] | ||
| fn verify_with_exceptions() { | ||
| const N: usize = 32; | ||
| const TIMES: usize = 1000; | ||
| let mut rng = rand::rng(); | ||
|
|
||
| let mut data = vec![0u32; N]; | ||
| let mut compressed = vec![0u32; N]; | ||
| let mut uncompressed = vec![0u32; N]; | ||
|
|
||
| for bit in 0..31 { | ||
| for _ in 0..TIMES { | ||
| for value in &mut data { | ||
| *value = rng.random(); | ||
| } | ||
|
|
||
| fast_pack(&data, 0, &mut compressed, 0, bit); | ||
| fast_unpack(&compressed, 0, &mut uncompressed, 0, bit); | ||
|
|
||
| mask_array(&mut data, (1 << bit) - 1); | ||
|
|
||
| assert_eq!( | ||
| data, uncompressed, | ||
| "Data does not match uncompressed output" | ||
| ); | ||
| } | ||
| fn spurious_out_test() { | ||
| fn check<C: BlockCodec + Default>(len: usize) { | ||
| let x = vec![0u32; 1024]; | ||
| let (blocks, _) = slice_to_blocks::<C>(&x[..len]); | ||
| let out = block_compress::<C>(cast_slice(blocks)).unwrap(); | ||
| assert!(out.is_empty() || blocks.is_empty()); |
There was a problem hiding this comment.
The comment says sub-block-sized inputs produce no output via BlockCodec, but the current block format always emits at least the length header (e.g. [0] for zero blocks). Consider updating the comment to match the actual behavior being tested (zero aligned blocks rather than “no output”).
BlockCodecandAnyLenCodec