Releases: thanos/ex_data_sketch
v0.5.0
Release date: 2026-03-11
Summary
v0.5.0 adds six new structures for advanced membership testing and set
reconciliation, plus FilterChain for composing filters into lifecycle-tier
pipelines. This is the largest feature release yet, bringing ExDataSketch to
13 sketch types across seven categories.
ExDataSketch now covers:
| Category | Algorithms |
|---|---|
| Cardinality | HyperLogLog (HLL) |
| Frequency | Count-Min Sketch (CMS) |
| Set operations | Theta Sketch |
| Quantiles | KLL, DDSketch |
| Frequency ranking | FrequentItems (SpaceSaving) |
| Membership | Bloom, Cuckoo, Quotient, CQF, XorFilter |
| Set reconciliation | IBLT |
| Composition | FilterChain |
What's new in v0.5.0
Cuckoo Filter (ExDataSketch.Cuckoo)
Membership testing with deletion support using partial-key cuckoo hashing.
Unlike Bloom filters, Cuckoo filters support safe deletion of previously
inserted items. Better space efficiency than Bloom at low false positive rates.
Key features:
- Partial-key cuckoo hashing with configurable fingerprint size (8, 12, 16 bits)
- Configurable bucket size (2 or 4 slots) and max kicks (100..2000)
- Safe deletion without false negatives on subsequent queries
- CKO1 binary state format
- EXSK serialization (sketch ID 8)
Quick start:
cuckoo = ExDataSketch.Cuckoo.new(capacity: 100_000, fingerprint_size: 8)
{:ok, cuckoo} = ExDataSketch.Cuckoo.put(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42") # true
{:ok, cuckoo} = ExDataSketch.Cuckoo.delete(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42") # falseQuotient Filter (ExDataSketch.Quotient)
Membership testing with safe deletion and merge. Splits fingerprints into
quotient (slot index) and remainder (stored value) with metadata bits for
cluster tracking. Supports merge without re-hashing.
Key features:
- Quotient/remainder fingerprint splitting with linear probing
- Metadata bits (is_occupied, is_continuation, is_shifted) for cluster tracking
- Safe deletion and merge support
- QOT1 binary state format
- EXSK serialization (sketch ID 9)
Quick start:
qf = ExDataSketch.Quotient.new(q: 16, r: 8)
qf = ExDataSketch.Quotient.put(qf, "item_a")
ExDataSketch.Quotient.member?(qf, "item_a") # true
# Merge two quotient filters
merged = ExDataSketch.Quotient.merge(qf_a, qf_b)Counting Quotient Filter (ExDataSketch.CQF)
Extends the quotient filter with variable-length counter encoding to answer
not just "is this present?" but "how many times was this inserted?" Counts are
approximate (never underestimated).
Key features:
- Variable-length counter encoding within runs
estimate_count/2for approximate multiplicity queries- Safe deletion (decrements count)
- Merge sums counts across filters
- CQF1 binary state format
- EXSK serialization (sketch ID 10)
Quick start:
cqf = ExDataSketch.CQF.new(q: 16, r: 8)
cqf = ExDataSketch.CQF.put(cqf, "event_x")
cqf = ExDataSketch.CQF.put(cqf, "event_x")
cqf = ExDataSketch.CQF.put(cqf, "event_x")
ExDataSketch.CQF.estimate_count(cqf, "event_x") # >= 3
ExDataSketch.CQF.member?(cqf, "event_x") # trueXorFilter (ExDataSketch.XorFilter)
Static, immutable membership filter with the smallest footprint and fastest
lookups. All items must be provided at construction time via build/2.
No insertion, deletion, or merge -- query only.
Key features:
- Build-once immutable construction via
build/2 - 8-bit or 16-bit fingerprints (configurable false positive rate)
- Smallest memory footprint of all membership filters
- XOR1 binary state format
- EXSK serialization (sketch ID 11)
Quick start:
items = MapSet.new(1..100_000)
{:ok, xor} = ExDataSketch.XorFilter.build(items, fingerprint_bits: 8)
ExDataSketch.XorFilter.member?(xor, 42) # true
ExDataSketch.XorFilter.member?(xor, 999_999) # false (probably)IBLT (ExDataSketch.IBLT)
Invertible Bloom Lookup Table for set reconciliation. Two parties each build
an IBLT from their sets, exchange them, and subtract to discover only the
differing items -- without transmitting the full sets.
Key features:
- Set mode (items only) and key-value mode
subtract/2for cell-wise differencelist_entries/1for peeling decoded entries (positive/negative sets)- Merge via cell-wise addition
- IBL1 binary state format
- EXSK serialization (sketch ID 12)
Quick start:
# Set reconciliation between two nodes
iblt_a = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_a)
iblt_b = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_b)
diff = ExDataSketch.IBLT.subtract(iblt_a, iblt_b)
{:ok, %{positive: only_in_a, negative: only_in_b}} = ExDataSketch.IBLT.list_entries(diff)FilterChain (ExDataSketch.FilterChain)
Capability-aware composition framework for chaining membership filters into
lifecycle-tier pipelines. Enforces valid chain positions based on each
filter's capabilities.
Key features:
add_stage/2with position validation (front/middle/terminal/adjunct)put/2fans out to all stages supporting insertion (skips static stages)member?/2queries stages in order, short-circuits on false- Lifecycle-tier patterns: hot Cuckoo (writes) -> cold XorFilter (snapshots)
- IBLT stages placed as adjuncts (not in query path)
- FCN1 binary state format
Quick start:
chain =
ExDataSketch.FilterChain.new()
|> ExDataSketch.FilterChain.add_stage(ExDataSketch.Cuckoo.new(capacity: 10_000))
|> ExDataSketch.FilterChain.add_stage(ExDataSketch.Bloom.new(capacity: 100_000))
{:ok, chain} = ExDataSketch.FilterChain.put(chain, "item_1")
ExDataSketch.FilterChain.member?(chain, "item_1") # trueBenchmarks
Benchmark suites added for all new structures:
bench/cuckoo_bench.exsbench/quotient_bench.exsbench/cqf_bench.exsbench/xor_filter_bench.exsbench/iblt_bench.exsbench/filter_chain_bench.exs
Run all benchmarks with mix bench.
Algorithm Matrix
| Algorithm | Purpose | EXSK ID | Backend |
|---|---|---|---|
| HLL | Cardinality estimation | 1 | Pure + Rust |
| CMS | Frequency estimation | 2 | Pure + Rust |
| Theta | Set operations | 3 | Pure + Rust |
| KLL | Rank/quantile estimation | 4 | Pure + Rust |
| DDSketch | Value-relative quantiles | 5 | Pure + Rust |
| FrequentItems | Heavy-hitter detection | 6 | Pure + Rust |
| Bloom | Membership testing | 7 | Pure |
| Cuckoo | Membership with deletion | 8 | Pure |
| Quotient | Membership with deletion/merge | 9 | Pure |
| CQF | Multiset membership/counting | 10 | Pure |
| XorFilter | Static membership testing | 11 | Pure |
| IBLT | Set reconciliation | 12 | Pure |
| FilterChain | Filter composition | -- | Pure |
Installation
def deps do
[
{:ex_data_sketch, "~> 0.5.0"}
]
endPrecompiled Rust NIF binaries are downloaded automatically on supported
platforms (macOS ARM64/x86_64, Linux x86_64/aarch64 glibc/musl). No Rust
toolchain required. The library works in pure Elixir mode on all other
platforms.
Upgrade Notes
- No breaking changes from v0.4.0.
- EXSK binaries produced by earlier v0.x releases remain fully compatible.
- The
ExDataSketch.Backendbehaviour now includes additional callbacks for
Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain. Custom backend
implementations must add these callbacks. - New error types:
UnsupportedOperationErrorandInvalidChainCompositionError. - All membership filter modules now export
capabilities/0returning a MapSet
of supported operations.
What's Next
- v0.6.0 - Rust NIF acceleration for v0.5.0 structures, and advanced Quantiles
- v0.7.0 - advanced Cardinality Sketches including CPC, HLL++, ULL (UltraLogLog), SetSketch
- v0.8.0 scope is under discussion. Potential directions include Rust NIF
acceleration for v0.5.0 structures, additional composition patterns, or
new sketch types.
Links
v0.4.0
Full Changelog: v0.3.0...v0.4.0
What's new in v0.4.0
Bloom filter:
- Automatic parameter derivation from capacity and false_positive_rate
- Double hashing (Kirsch-Mitzenmacher) for k bit positions from one hash
- BLM1 binary format (40-byte header + packed bitset)
- Merge via bitwise OR
- Capacity overflow validation for the u32 binary format constraints
- 40 unit tests, 5 property tests, 2 statistical FPR validation tests
| Algorithm | What it does |
|---|---|
| HyperLogLog (HLL) | Count distinct elements (~0.8% error, 16 KB) |
| Count-Min Sketch (CMS) | Estimate item frequencies |
| Theta Sketch | Set union/intersection cardinality |
| KLL | Rank-accurate quantiles (median, P99) |
| DDSketch | Value-relative quantiles (P99 latency +/- 1%) |
| FrequentItems | Top-k most frequent items (SpaceSaving) |
| Bloom Filter | "Have I seen this before?" with tunable FPR |
ExDataSketch v0.3.0 -- FrequentItems / Heavy-Hitters
ExDataSketch v0.3.0 -- FrequentItems / Heavy-Hitters
ExDataSketch v0.3.0 adds FrequentItems, a streaming heavy-hitter sketch based on the SpaceSaving algorithm. This is the sixth sketch family in the library and the first non-numeric sketch type.
What is FrequentItems?
FrequentItems tracks the approximate top-k most frequent items in a data stream using bounded memory. It maintains at most k counters, each storing an item, its estimated count, and a maximum overcount error. The SpaceSaving algorithm guarantees that any item whose true frequency exceeds N/k will always be tracked.
sketch =
ExDataSketch.FrequentItems.new(k: 10)
|> ExDataSketch.FrequentItems.update_many(stream_of_page_views)
# Get the top 5 most frequent items
ExDataSketch.FrequentItems.top_k(sketch, 5)
# [%{item: "/home", estimate: 4821, error: 12, lower: 4809, upper: 4821}, ...]
# Check a specific item
ExDataSketch.FrequentItems.estimate(sketch, "/checkout")
# {:ok, %{estimate: 312, error: 5, lower: 307, upper: 312}}Highlights
Full Pure Elixir + Rust NIF dual-backend support
Like all ExDataSketch algorithms, FrequentItems ships with a pure Elixir implementation and optional Rust NIF acceleration. Both backends produce byte-identical serialized output for the same inputs.
Canonical FI1 binary state format
FrequentItems uses a 32-byte header followed by variable-length entries sorted by item bytes. The format is deterministic and portable across backends.
Flexible key encoding
Three key encoding modes are supported:
| Encoding | Use case |
|---|---|
:binary (default) |
String keys, raw binary data |
:int |
Integer keys (signed 64-bit little-endian) |
{:term, :external} |
Arbitrary Erlang terms via :erlang.term_to_binary/1 |
Deterministic merge
FrequentItems merge is commutative. It combines counts additively across the union of keys, then replays weighted updates in sorted key order into an empty sketch. Count (n) is always exactly additive regardless of whether entries are dropped during capacity enforcement.
Smart NIF routing
Query operations (top_k, estimate) route to Rust only when k >= 256, where NIF acceleration outweighs the call boundary overhead. Header reads (count, entry_count) always use Pure Elixir since they are O(1) binary pattern matches.
Other changes
- Rust NIF for
theta_compactwith dirty scheduler support. - Mox added as a test dependency for backend contract testing.
- EXSK codec sketch ID 6 for FrequentItems.
Upgrading
def deps do
[
{:ex_data_sketch, "~> 0.3.0"}
]
endNo breaking changes from v0.2.1. All existing sketches (HLL, CMS, Theta, KLL, DDSketch) are unchanged.
Algorithm coverage
| Algorithm | Purpose | Status |
|---|---|---|
| HyperLogLog (HLL) | Cardinality estimation | Pure + Rust |
| Count-Min Sketch (CMS) | Frequency estimation | Pure + Rust |
| Theta Sketch | Set operations on cardinalities | Pure + Rust |
| KLL Quantiles | Rank and quantile estimation | Pure + Rust |
| DDSketch | Relative-error quantile estimation | Pure + Rust |
| FrequentItems (SpaceSaving) | Heavy-hitter detection | Pure + Rust |
v0.2.1
v0.2.0-alpha.1
What's Changed
- Deterministic vectors by @thanos in #76
- Add KLL quantiles sketch with Pure Elixir and Rust NIF backends by @thanos in #86
Full Changelog: v0.1.0-alpha.12...v0.2.0-alpha.1
v0.2.0
v0.1.0-alpha.9
Full Changelog: v0.1.0-alpha.8...v0.1.0-alpha.9
v0.1.0-alpha.8
Full Changelog: v0.1.0-alpha.7...v0.1.0-alpha.8
v0.1.0-alpha.7
Full Changelog: v0.1.0-alpha.3...v0.1.0-alpha.7
v0.1.0-alpha.4
Add NIF version features to Cargo.toml Required since Rustler v0.30.0 for precompiled NIF builds. The rustler-precompiled-action selects the correct feature (nif_version_2_16, nif_version_2_17) via cargo --features flag.