Skip to content

V0.6.0/core sketch stabilization n advanced quantiles#172

Merged
thanos merged 11 commits intomainfrom
v0.6.0/Core_Sketch_Stabilization_n_Advanced_Quantiles
Mar 13, 2026
Merged

V0.6.0/core sketch stabilization n advanced quantiles#172
thanos merged 11 commits intomainfrom
v0.6.0/Core_Sketch_Stabilization_n_Advanced_Quantiles

Conversation

@thanos
Copy link
Owner

@thanos thanos commented Mar 9, 2026

v0.6.0 Rust NIF Parity + Benchmarks + Docs -- Complete

Rust NIFs Created (6 new .rs files)

  • bloom.rs -- put_many + merge (u128 arithmetic for Elixir bignum parity)
  • cuckoo.rs -- put_many with kick loop, {:error, "full", binary} return
  • quotient.rs + quotient_core.rs -- put_many + merge (shared slot arithmetic)
  • cqf.rs -- put_many + merge with counting (reuses quotient_core)
  • xor_filter.rs -- build via hypergraph peeling (HashSet-based, highest-value NIF)
  • iblt.rs -- put_many + merge (splitmix64 hash functions)

Wiring

  • lib.rs -- 6 new mod declarations
  • nif.ex -- 16 new NIF stubs (normal + dirty variants)
  • rust.ex -- NIF dispatch for all 6 filters with dirty thresholds, encode_iblt_pairs/1 helper
  • error.rs -- Added error_full_binary for Cuckoo error return

Parity Tests (13 new, 33 total)

  • Cuckoo: put_many serialization, member? results
  • Quotient: put_many + merge serialization, member? results
  • CQF: put_many + merge serialization, member? + estimate_count
  • XorFilter: xor8 and xor16 build + member? verification
  • IBLT: put_many + merge serialization, member? results

Benchmarks (3 new files)

  • bench/req_bench.exs -- REQ operations
  • bench/misra_gries_bench.exs -- Misra-Gries operations
  • bench/xxhash3_bench.exs -- XXHash3 NIF vs phash2 throughput

Documentation

  • README.md -- Updated all 6 filters to "Pure + Rust", added REQ and MisraGries rows
  • CHANGELOG.md -- Added v0.6.0 section
  • guides/usage_guide.md -- Added REQ, MisraGries, and XXHash3 sections

Test Results

  • 1226 tests (109 doctests, 128 properties, 989 tests), 0 failures
  • All formatting passes (mix format --check-formatted)

thanos added 3 commits March 8, 2026 17:19
  Phase 1: KLL Stabilization (CDF and PMF) - closed #155

  - Added kll_cdf/3 and kll_pmf/3 callbacks to backend.ex
  - Implemented in pure.ex using existing sorted view infrastructure
  - Delegated in rust.ex
  - Added cdf/2 and pmf/2 to kll.ex public API
  - Expanded quantiles.ex facade with quantiles/2, rank/2, cdf/2, pmf/2
  - 7 new tests in kll_test.exs

  Phase 2: DDSketch Rank - closed #156

  - Added ddsketch_rank/3 callback to backend.ex
  - Implemented by walking sparse bins and accumulating weight
  - Added rank/2 to ddsketch.ex public API
  - Added DDSketch clause to quantiles.ex rank/2
  - 7 new tests in ddsketch_test.exs

  Phase 3: REQ Sketch (New -- Sketch ID 13)  - closed #157

  - Created lib/ex_data_sketch/req.ex with full public API
  - 11 new backend callbacks, Pure implementation with biased compaction (HRA/LRA)
  - Binary format: REQ1 with HRA flag
  - Quantiles facade integration (all 12 dispatch functions)
  - 65 tests (3 properties) + 3 merge law properties

  Phase 4: Misra-Gries (New -- Sketch ID 14) - closed #158

  - Created lib/ex_data_sketch/misra_gries.ex with full public API
  - 8 new backend callbacks, Pure implementation with decrement-all eviction
  - Binary format: MG01 with variable-length key entries
  - Key encoding support (binary, int, term)
  - 40 tests (2 properties) + 3 merge law properties

  Phase 5: XXHash3 via Rust NIF - closed #159

  - Added xxhash-rust dependency to Cargo.toml
  - Created hash.rs with xxhash3_64_nif and xxhash3_64_seeded_nif
  - Added xxhash3_64/1 and xxhash3_64/2 to hash.ex with fallback
  - Opt-in (not default) for backwards compatibility
  - 11 new tests

  Totals

  - 1213 tests (976 tests + 109 doctests + 128 properties), 0 failures
  - 2 new modules: ExDataSketch.REQ, ExDataSketch.MisraGries
  - ~30 new backend callbacks across all phases
  - All files formatted, zero warnings
  1. req_encode_state (arity 9 -> 1): Defined a REQState struct with @enforce_keys for all 9 fields (k, hra, n, min_val, max_val, num_levels,
  compaction_bits, level_sizes, levels). The function now pattern-matches a single %REQState{} parameter. Construction sites (req_new, req_do_merge)
  build the struct; req_decode_state returns one. Update sites use plain %{state | ...} since Elixir preserves the struct type.

  2. ddsketch_rank (nesting depth 3 -> 1): Extracted logic into dds_compute_rank/2 with three pattern-matched clauses — %{n: 0} returns nil, value <
  0.0 guard returns 0.0, and the default clause does the bin walk. Also removed the dead if value >= 0.0 branch (always true since value < 0.0 was
  already handled by the guard).
  Rust NIFs Created (6 new .rs files)

  - bloom.rs -- put_many + merge (u128 arithmetic for Elixir bignum parity) - closed #163
  - cuckoo.rs -- put_many with kick loop, {:error, "full", binary} return - closed #165
  - quotient.rs + quotient_core.rs -- put_many + merge (shared slot arithmetic) - closed #166
  - cqf.rs -- put_many + merge with counting (reuses quotient_core) - closed #167
  - xor_filter.rs -- build via hypergraph peeling (HashSet-based, highest-value NIF) - closed #168
  - iblt.rs -- put_many + merge (splitmix64 hash functions) - closed #164

  Wiring

  - lib.rs -- 6 new mod declarations
  - nif.ex -- 16 new NIF stubs (normal + dirty variants)
  - rust.ex -- NIF dispatch for all 6 filters with dirty thresholds, encode_iblt_pairs/1 helper
  - error.rs -- Added error_full_binary for Cuckoo error return

  Parity Tests (13 new, 33 total) - closed #169

  - Cuckoo: put_many serialization, member? results
  - Quotient: put_many + merge serialization, member? results
  - CQF: put_many + merge serialization, member? + estimate_count
  - XorFilter: xor8 and xor16 build + member? verification
  - IBLT: put_many + merge serialization, member? results

  Benchmarks (3 new files) - closed #170

  - bench/req_bench.exs -- REQ operations
  - bench/misra_gries_bench.exs -- Misra-Gries operations
  - bench/xxhash3_bench.exs -- XXHash3 NIF vs phash2 throughput

  Documentation - closed #171

  - README.md -- Updated all 6 filters to "Pure + Rust", added REQ and MisraGries rows
  - CHANGELOG.md -- Added v0.6.0 section
  - guides/usage_guide.md -- Added REQ, MisraGries, and XXHash3 sections

  Test Results

  - 1226 tests (109 doctests, 128 properties, 989 tests), 0 failures
  - All formatting passes (mix format --check-formatted)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the v0.6.0 release by expanding sketch/filter capabilities (REQ quantiles + Misra-Gries heavy hitters), adding Rust NIF acceleration for membership filter batch operations, and extending parity tests/benchmarks/docs to validate Pure vs Rust behavior and performance.

Changes:

  • Add Pure implementations and public APIs for ExDataSketch.REQ and ExDataSketch.MisraGries, plus EXSK codec IDs and facade integration.
  • Add Rust NIF modules + Elixir Rust-backend wiring for Bloom/Cuckoo/Quotient/CQF/XorFilter/IBLT batch operations, plus XXHash3 hashing.
  • Expand tests (unit/property/parity), benchmarks, and documentation/README/changelog updates for the new capabilities.

Reviewed changes

Copilot reviewed 40 out of 41 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test/parity_test.exs Add Rust vs Pure parity tests for new filters
test/merge_laws_test.exs Add merge-law properties for REQ/MisraGries
test/ex_data_sketch_req_test.exs Add REQ unit + property tests
test/ex_data_sketch_misra_gries_test.exs Add Misra-Gries unit + property tests
test/ex_data_sketch_kll_test.exs Add KLL CDF/PMF tests
test/ex_data_sketch_hash_test.exs Add XXHash3 hashing tests
test/ex_data_sketch_ddsketch_test.exs Add DDSketch rank tests
test/ex_data_sketch_backend_test.exs Update backend stub for new callbacks
native/ex_data_sketch_nif/src/bloom.rs Rust NIF: Bloom put_many/merge
native/ex_data_sketch_nif/src/cuckoo.rs Rust NIF: Cuckoo put_many + full return
native/ex_data_sketch_nif/src/quotient_core.rs Shared slot arithmetic for QF/CQF
native/ex_data_sketch_nif/src/quotient.rs Rust NIF: Quotient put_many/merge
native/ex_data_sketch_nif/src/cqf.rs Rust NIF: CQF put_many/merge with counts
native/ex_data_sketch_nif/src/xor_filter.rs Rust NIF: XOR filter build
native/ex_data_sketch_nif/src/iblt.rs Rust NIF: IBLT put_many/merge
native/ex_data_sketch_nif/src/hash.rs Rust NIF: XXHash3 64-bit
native/ex_data_sketch_nif/src/error.rs Add error tuple with binary payload
native/ex_data_sketch_nif/src/lib.rs Wire new NIF modules
native/ex_data_sketch_nif/Cargo.toml Add xxhash-rust dependency
native/ex_data_sketch_nif/Cargo.lock Lockfile updates for xxhash-rust
mix.exs Add REQ + MisraGries to docs modules
lib/ex_data_sketch/req.ex Public REQ API + EXSK encode/decode
lib/ex_data_sketch/misra_gries.ex Public Misra-Gries API + EXSK encode/decode
lib/ex_data_sketch/quantiles.ex Add REQ + new quantiles ops to facade
lib/ex_data_sketch/kll.ex Add KLL CDF/PMF wrapper API
lib/ex_data_sketch/ddsketch.ex Add DDSketch rank wrapper API
lib/ex_data_sketch/hash.ex Add XXHash3 Elixir wrapper + docs
lib/ex_data_sketch/codec.ex Add EXSK IDs 13 (REQ) / 14 (MisraGries)
lib/ex_data_sketch/backend.ex Add backend callbacks for new features
lib/ex_data_sketch/backend/pure.ex Implement KLL CDF/PMF, DDSketch rank, REQ, MisraGries
lib/ex_data_sketch/backend/rust.ex Add NIF dispatch for membership batch ops
lib/ex_data_sketch/nif.ex Add NIF stubs for new Rust functions
lib/ex_data_sketch.ex Update top-level docs + update_many dispatch
guides/usage_guide.md Document REQ/MisraGries/XXHash3 usage
README.md Update support matrix for new features
CHANGELOG.md Add v0.6.0 release notes
docs/after-0.5.0.md Add long-form strategic document
bench/run.exs Include new benchmarks in runner
bench/req_bench.exs Add REQ benchmark
bench/misra_gries_bench.exs Add Misra-Gries benchmark
bench/xxhash3_bench.exs Add XXHash3 benchmark

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

thanos added 2 commits March 8, 2026 21:32
  FilterChain coverage (30 new tests):
  - deserialize/1 error paths: truncated data, unknown sketch ID, non-EXSK binary, trailing data, invalid header
  - member?/2 with Quotient, CQF, 3-stage chains (Bloom+Cuckoo+Quotient), Bloom+XorFilter
  - put/2 with Quotient, CQF, and 3-stage mixed chains
  - delete/2 with Quotient, mixed deletable stages (Cuckoo+Quotient+CQF), not-found cuckoo, UnsupportedOperationError for Bloom+Cuckoo
  - count/1 with Cuckoo+Quotient+CQF, XorFilter
  - size_bytes/1 with adjunct IBLT, all 5 stage types combined
  - serialize/deserialize round-trips with Quotient, CQF, Cuckoo, all dynamic stages+adjunct, Bloom+XorFilter+IBLT

  Rust backend coverage (26 new tests):
  - Empty list early returns for bloom_put_many, cuckoo_put_many, quotient_put_many, cqf_put_many, iblt_put_many, xor_build
  - Dirty scheduler threshold paths for all 6 membership filters (bloom put_many/merge, cuckoo, quotient put_many/merge, cqf put_many/merge,
  xor_build, iblt put_many/merge) plus kll, ddsketch, fi, and theta_compact
  - Cuckoo {:error, :full, binary} error translation
  - unwrap_ok! error path (RuntimeError on NIF error)
  Issue 1: Quantiles facade DDSketch cdf/pmf (already fixed above)
  - Added explicit cdf/2 and pmf/2 clauses for %DDSketch{} that raise ArgumentError with a clear message instead of FunctionClauseError
  - Added 7 tests (cdf/pmf delegation for KLL/REQ, DDSketch raises, DDSketch rank)

  Issue 2: Usage guide fixes
  - xxhash3_64("some data", seed: 42) changed to xxhash3_64("some data", 42) (positional arg, not keyword)
  - n/(k+1) frequency guarantee changed to n/k to match the module docs
  - top_k(sketch) changed to top_k(sketch, 10) (requires a limit argument)

  Issue 3: xxhash3_64/2 seed guard
  - Tightened guard from is_integer(seed) to is_integer(seed) and seed >= 0 -- negative seeds now raise FunctionClauseError at the Elixir level
  instead of silently falling back to phash2
  - Narrowed rescue from _ (catches everything) to ErlangError (only catches NIF-not-loaded errors)

  Issue 4: misra_gries_bench.exs top_k/1
  - Fixed MisraGries.top_k(s.sketch_populated) to MisraGries.top_k(s.sketch_populated, 10)

Issue Closed
 - closed #173
 - closed #174
 - closed #175
 - closed #176
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  - quotient_put_many_impl -- added state_bin.len() < QOT_HEADER_SIZE check before reading state[8..12]
  - quotient_merge_impl -- added a_bin.len() < QOT_HEADER_SIZE || b_bin.len() < QOT_HEADER_SIZE check before reading a[8..12]/b[8..12]

  cqf.rs:
  - cqf_put_many_impl -- added state_bin.len() < CQF_HEADER_SIZE check before reading state[8..12]
  - cqf_merge_impl -- added a_bin.len() < CQF_HEADER_SIZE || b_bin.len() < CQF_HEADER_SIZE check before reading a[8..12]/b[8..12]

  All four return {:error, "... too short for header"} instead of panicking. The other NIF files (bloom, cuckoo, iblt, xor_filter) were already safe
  -- bloom/cuckoo/iblt compute expected_len from passed-in parameters and check before slicing, and xor_filter has no state binary to read.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  - iblt.rs: Added cell_count vs binary length validation before the merge loop
  - error.rs: Allocation failure returns error tuple instead of panicking
  - bloom.rs: Parameter validation for bit_count and hash_count
  - quotient_core.rs: All recursive functions converted to iterative loops
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  1. xor_filter.rs - Deterministic dedup: Replaced HashSet with sort_unstable() + dedup() so the same input always produces the same hash set
  regardless of hasher randomization.
  2. xor_filter.rs - Deterministic peeling: Changed the peel queue from Vec (LIFO pop()) to VecDeque with pop_front()/push_front() to match Pure
  Elixir's list head-take and prepend ordering exactly.
  3. parity_test.exs - Strengthened XorFilter tests: Added byte-identical serialize assertions for both xor8 and xor16, plus member? parity checks
  for 200 non-members to verify false-positive agreement.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  - quotient.rs:66 / cqf.rs:321: Prefixed unused q params with _
  - quotient_core.rs: Removed dead code — clr_meta_bit, do_delete, and shift_left (CQF merge uses extract-all-counted + rebuild instead)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 42 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  1. xor_filter.rs: seed + retry → seed.wrapping_add(retry) to prevent u32 overflow panic in debug builds
  2. hash.ex: Qualified the stability claim — XXHash3 is stable only when the Rust NIF is available; the phash2 fallback is not stable across OTP
  major versions
  3. hash_test.exs: Renamed "known test vector" to "empty string with default seed matches explicit seed 0" since it only tests determinism, not a
  known constant
  4. misra_gries.ex: Added enc_byte in [0, 1, 2] guard to decode_params/1 so unknown encoding bytes fall through to the catch-all clause returning
  {:error, DeserializationError}
  5. CHANGELOG.md: Fixed v0.6.0 date from 2026-03-08 to 2026-03-11 so version history is monotonically ordered
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 42 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  1. hash.ex: Seed is now masked to 0..2^64-1 via seed &&& @max_u64 before passing to the NIF, preventing ArgumentError for seeds exceeding u64 range
  2. misra_gries.ex: Added threshold < 1.0 guard to frequent/2 so out-of-range thresholds raise FunctionClauseError instead of silently returning
  empty lists
  3. mix.exs: Bumped @Version to "0.6.0", updated description to include REQ/MisraGries/XXHash3, added 3 missing bench files to the bench: alias
  4. README.md: Updated dependency to ~> 0.6.0, updated roadmap to mark v0.6.0 as Released with correct description
  5. CHANGELOG.md: Expanded v0.6.0 entry to include REQ sketch, Misra-Gries, XXHash3 NIF, KLL cdf/pmf, DDSketch rank, and Quantiles facade
@thanos thanos merged commit cb619fb into main Mar 13, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants