V0.6.0/core sketch stabilization n advanced quantiles#172
Conversation
Phase 1: KLL Stabilization (CDF and PMF) - closed #155 - Added kll_cdf/3 and kll_pmf/3 callbacks to backend.ex - Implemented in pure.ex using existing sorted view infrastructure - Delegated in rust.ex - Added cdf/2 and pmf/2 to kll.ex public API - Expanded quantiles.ex facade with quantiles/2, rank/2, cdf/2, pmf/2 - 7 new tests in kll_test.exs Phase 2: DDSketch Rank - closed #156 - Added ddsketch_rank/3 callback to backend.ex - Implemented by walking sparse bins and accumulating weight - Added rank/2 to ddsketch.ex public API - Added DDSketch clause to quantiles.ex rank/2 - 7 new tests in ddsketch_test.exs Phase 3: REQ Sketch (New -- Sketch ID 13) - closed #157 - Created lib/ex_data_sketch/req.ex with full public API - 11 new backend callbacks, Pure implementation with biased compaction (HRA/LRA) - Binary format: REQ1 with HRA flag - Quantiles facade integration (all 12 dispatch functions) - 65 tests (3 properties) + 3 merge law properties Phase 4: Misra-Gries (New -- Sketch ID 14) - closed #158 - Created lib/ex_data_sketch/misra_gries.ex with full public API - 8 new backend callbacks, Pure implementation with decrement-all eviction - Binary format: MG01 with variable-length key entries - Key encoding support (binary, int, term) - 40 tests (2 properties) + 3 merge law properties Phase 5: XXHash3 via Rust NIF - closed #159 - Added xxhash-rust dependency to Cargo.toml - Created hash.rs with xxhash3_64_nif and xxhash3_64_seeded_nif - Added xxhash3_64/1 and xxhash3_64/2 to hash.ex with fallback - Opt-in (not default) for backwards compatibility - 11 new tests Totals - 1213 tests (976 tests + 109 doctests + 128 properties), 0 failures - 2 new modules: ExDataSketch.REQ, ExDataSketch.MisraGries - ~30 new backend callbacks across all phases - All files formatted, zero warnings
1. req_encode_state (arity 9 -> 1): Defined a REQState struct with @enforce_keys for all 9 fields (k, hra, n, min_val, max_val, num_levels,
compaction_bits, level_sizes, levels). The function now pattern-matches a single %REQState{} parameter. Construction sites (req_new, req_do_merge)
build the struct; req_decode_state returns one. Update sites use plain %{state | ...} since Elixir preserves the struct type.
2. ddsketch_rank (nesting depth 3 -> 1): Extracted logic into dds_compute_rank/2 with three pattern-matched clauses — %{n: 0} returns nil, value <
0.0 guard returns 0.0, and the default clause does the bin walk. Also removed the dead if value >= 0.0 branch (always true since value < 0.0 was
already handled by the guard).
Rust NIFs Created (6 new .rs files) - bloom.rs -- put_many + merge (u128 arithmetic for Elixir bignum parity) - closed #163 - cuckoo.rs -- put_many with kick loop, {:error, "full", binary} return - closed #165 - quotient.rs + quotient_core.rs -- put_many + merge (shared slot arithmetic) - closed #166 - cqf.rs -- put_many + merge with counting (reuses quotient_core) - closed #167 - xor_filter.rs -- build via hypergraph peeling (HashSet-based, highest-value NIF) - closed #168 - iblt.rs -- put_many + merge (splitmix64 hash functions) - closed #164 Wiring - lib.rs -- 6 new mod declarations - nif.ex -- 16 new NIF stubs (normal + dirty variants) - rust.ex -- NIF dispatch for all 6 filters with dirty thresholds, encode_iblt_pairs/1 helper - error.rs -- Added error_full_binary for Cuckoo error return Parity Tests (13 new, 33 total) - closed #169 - Cuckoo: put_many serialization, member? results - Quotient: put_many + merge serialization, member? results - CQF: put_many + merge serialization, member? + estimate_count - XorFilter: xor8 and xor16 build + member? verification - IBLT: put_many + merge serialization, member? results Benchmarks (3 new files) - closed #170 - bench/req_bench.exs -- REQ operations - bench/misra_gries_bench.exs -- Misra-Gries operations - bench/xxhash3_bench.exs -- XXHash3 NIF vs phash2 throughput Documentation - closed #171 - README.md -- Updated all 6 filters to "Pure + Rust", added REQ and MisraGries rows - CHANGELOG.md -- Added v0.6.0 section - guides/usage_guide.md -- Added REQ, MisraGries, and XXHash3 sections Test Results - 1226 tests (109 doctests, 128 properties, 989 tests), 0 failures - All formatting passes (mix format --check-formatted)
There was a problem hiding this comment.
Pull request overview
This PR advances the v0.6.0 release by expanding sketch/filter capabilities (REQ quantiles + Misra-Gries heavy hitters), adding Rust NIF acceleration for membership filter batch operations, and extending parity tests/benchmarks/docs to validate Pure vs Rust behavior and performance.
Changes:
- Add Pure implementations and public APIs for
ExDataSketch.REQandExDataSketch.MisraGries, plus EXSK codec IDs and facade integration. - Add Rust NIF modules + Elixir Rust-backend wiring for Bloom/Cuckoo/Quotient/CQF/XorFilter/IBLT batch operations, plus XXHash3 hashing.
- Expand tests (unit/property/parity), benchmarks, and documentation/README/changelog updates for the new capabilities.
Reviewed changes
Copilot reviewed 40 out of 41 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/parity_test.exs | Add Rust vs Pure parity tests for new filters |
| test/merge_laws_test.exs | Add merge-law properties for REQ/MisraGries |
| test/ex_data_sketch_req_test.exs | Add REQ unit + property tests |
| test/ex_data_sketch_misra_gries_test.exs | Add Misra-Gries unit + property tests |
| test/ex_data_sketch_kll_test.exs | Add KLL CDF/PMF tests |
| test/ex_data_sketch_hash_test.exs | Add XXHash3 hashing tests |
| test/ex_data_sketch_ddsketch_test.exs | Add DDSketch rank tests |
| test/ex_data_sketch_backend_test.exs | Update backend stub for new callbacks |
| native/ex_data_sketch_nif/src/bloom.rs | Rust NIF: Bloom put_many/merge |
| native/ex_data_sketch_nif/src/cuckoo.rs | Rust NIF: Cuckoo put_many + full return |
| native/ex_data_sketch_nif/src/quotient_core.rs | Shared slot arithmetic for QF/CQF |
| native/ex_data_sketch_nif/src/quotient.rs | Rust NIF: Quotient put_many/merge |
| native/ex_data_sketch_nif/src/cqf.rs | Rust NIF: CQF put_many/merge with counts |
| native/ex_data_sketch_nif/src/xor_filter.rs | Rust NIF: XOR filter build |
| native/ex_data_sketch_nif/src/iblt.rs | Rust NIF: IBLT put_many/merge |
| native/ex_data_sketch_nif/src/hash.rs | Rust NIF: XXHash3 64-bit |
| native/ex_data_sketch_nif/src/error.rs | Add error tuple with binary payload |
| native/ex_data_sketch_nif/src/lib.rs | Wire new NIF modules |
| native/ex_data_sketch_nif/Cargo.toml | Add xxhash-rust dependency |
| native/ex_data_sketch_nif/Cargo.lock | Lockfile updates for xxhash-rust |
| mix.exs | Add REQ + MisraGries to docs modules |
| lib/ex_data_sketch/req.ex | Public REQ API + EXSK encode/decode |
| lib/ex_data_sketch/misra_gries.ex | Public Misra-Gries API + EXSK encode/decode |
| lib/ex_data_sketch/quantiles.ex | Add REQ + new quantiles ops to facade |
| lib/ex_data_sketch/kll.ex | Add KLL CDF/PMF wrapper API |
| lib/ex_data_sketch/ddsketch.ex | Add DDSketch rank wrapper API |
| lib/ex_data_sketch/hash.ex | Add XXHash3 Elixir wrapper + docs |
| lib/ex_data_sketch/codec.ex | Add EXSK IDs 13 (REQ) / 14 (MisraGries) |
| lib/ex_data_sketch/backend.ex | Add backend callbacks for new features |
| lib/ex_data_sketch/backend/pure.ex | Implement KLL CDF/PMF, DDSketch rank, REQ, MisraGries |
| lib/ex_data_sketch/backend/rust.ex | Add NIF dispatch for membership batch ops |
| lib/ex_data_sketch/nif.ex | Add NIF stubs for new Rust functions |
| lib/ex_data_sketch.ex | Update top-level docs + update_many dispatch |
| guides/usage_guide.md | Document REQ/MisraGries/XXHash3 usage |
| README.md | Update support matrix for new features |
| CHANGELOG.md | Add v0.6.0 release notes |
| docs/after-0.5.0.md | Add long-form strategic document |
| bench/run.exs | Include new benchmarks in runner |
| bench/req_bench.exs | Add REQ benchmark |
| bench/misra_gries_bench.exs | Add Misra-Gries benchmark |
| bench/xxhash3_bench.exs | Add XXHash3 benchmark |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
FilterChain coverage (30 new tests):
- deserialize/1 error paths: truncated data, unknown sketch ID, non-EXSK binary, trailing data, invalid header
- member?/2 with Quotient, CQF, 3-stage chains (Bloom+Cuckoo+Quotient), Bloom+XorFilter
- put/2 with Quotient, CQF, and 3-stage mixed chains
- delete/2 with Quotient, mixed deletable stages (Cuckoo+Quotient+CQF), not-found cuckoo, UnsupportedOperationError for Bloom+Cuckoo
- count/1 with Cuckoo+Quotient+CQF, XorFilter
- size_bytes/1 with adjunct IBLT, all 5 stage types combined
- serialize/deserialize round-trips with Quotient, CQF, Cuckoo, all dynamic stages+adjunct, Bloom+XorFilter+IBLT
Rust backend coverage (26 new tests):
- Empty list early returns for bloom_put_many, cuckoo_put_many, quotient_put_many, cqf_put_many, iblt_put_many, xor_build
- Dirty scheduler threshold paths for all 6 membership filters (bloom put_many/merge, cuckoo, quotient put_many/merge, cqf put_many/merge,
xor_build, iblt put_many/merge) plus kll, ddsketch, fi, and theta_compact
- Cuckoo {:error, :full, binary} error translation
- unwrap_ok! error path (RuntimeError on NIF error)
Issue 1: Quantiles facade DDSketch cdf/pmf (already fixed above)
- Added explicit cdf/2 and pmf/2 clauses for %DDSketch{} that raise ArgumentError with a clear message instead of FunctionClauseError
- Added 7 tests (cdf/pmf delegation for KLL/REQ, DDSketch raises, DDSketch rank)
Issue 2: Usage guide fixes
- xxhash3_64("some data", seed: 42) changed to xxhash3_64("some data", 42) (positional arg, not keyword)
- n/(k+1) frequency guarantee changed to n/k to match the module docs
- top_k(sketch) changed to top_k(sketch, 10) (requires a limit argument)
Issue 3: xxhash3_64/2 seed guard
- Tightened guard from is_integer(seed) to is_integer(seed) and seed >= 0 -- negative seeds now raise FunctionClauseError at the Elixir level
instead of silently falling back to phash2
- Narrowed rescue from _ (catches everything) to ErlangError (only catches NIF-not-loaded errors)
Issue 4: misra_gries_bench.exs top_k/1
- Fixed MisraGries.top_k(s.sketch_populated) to MisraGries.top_k(s.sketch_populated, 10)
Issue Closed
- closed #173
- closed #174
- closed #175
- closed #176
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- quotient_put_many_impl -- added state_bin.len() < QOT_HEADER_SIZE check before reading state[8..12]
- quotient_merge_impl -- added a_bin.len() < QOT_HEADER_SIZE || b_bin.len() < QOT_HEADER_SIZE check before reading a[8..12]/b[8..12]
cqf.rs:
- cqf_put_many_impl -- added state_bin.len() < CQF_HEADER_SIZE check before reading state[8..12]
- cqf_merge_impl -- added a_bin.len() < CQF_HEADER_SIZE || b_bin.len() < CQF_HEADER_SIZE check before reading a[8..12]/b[8..12]
All four return {:error, "... too short for header"} instead of panicking. The other NIF files (bloom, cuckoo, iblt, xor_filter) were already safe
-- bloom/cuckoo/iblt compute expected_len from passed-in parameters and check before slicing, and xor_filter has no state binary to read.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- iblt.rs: Added cell_count vs binary length validation before the merge loop - error.rs: Allocation failure returns error tuple instead of panicking - bloom.rs: Parameter validation for bit_count and hash_count - quotient_core.rs: All recursive functions converted to iterative loops
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1. xor_filter.rs - Deterministic dedup: Replaced HashSet with sort_unstable() + dedup() so the same input always produces the same hash set regardless of hasher randomization. 2. xor_filter.rs - Deterministic peeling: Changed the peel queue from Vec (LIFO pop()) to VecDeque with pop_front()/push_front() to match Pure Elixir's list head-take and prepend ordering exactly. 3. parity_test.exs - Strengthened XorFilter tests: Added byte-identical serialize assertions for both xor8 and xor16, plus member? parity checks for 200 non-members to verify false-positive agreement.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 41 out of 42 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1. xor_filter.rs: seed + retry → seed.wrapping_add(retry) to prevent u32 overflow panic in debug builds
2. hash.ex: Qualified the stability claim — XXHash3 is stable only when the Rust NIF is available; the phash2 fallback is not stable across OTP
major versions
3. hash_test.exs: Renamed "known test vector" to "empty string with default seed matches explicit seed 0" since it only tests determinism, not a
known constant
4. misra_gries.ex: Added enc_byte in [0, 1, 2] guard to decode_params/1 so unknown encoding bytes fall through to the catch-all clause returning
{:error, DeserializationError}
5. CHANGELOG.md: Fixed v0.6.0 date from 2026-03-08 to 2026-03-11 so version history is monotonically ordered
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 41 out of 42 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1. hash.ex: Seed is now masked to 0..2^64-1 via seed &&& @max_u64 before passing to the NIF, preventing ArgumentError for seeds exceeding u64 range 2. misra_gries.ex: Added threshold < 1.0 guard to frequent/2 so out-of-range thresholds raise FunctionClauseError instead of silently returning empty lists 3. mix.exs: Bumped @Version to "0.6.0", updated description to include REQ/MisraGries/XXHash3, added 3 missing bench files to the bench: alias 4. README.md: Updated dependency to ~> 0.6.0, updated roadmap to mark v0.6.0 as Released with correct description 5. CHANGELOG.md: Expanded v0.6.0 entry to include REQ sketch, Misra-Gries, XXHash3 NIF, KLL cdf/pmf, DDSketch rank, and Quantiles facade
v0.6.0 Rust NIF Parity + Benchmarks + Docs -- Complete
Rust NIFs Created (6 new .rs files)
Wiring
Parity Tests (13 new, 33 total)
Benchmarks (3 new files)
Documentation
Test Results