Skip to content

Releases: thanos/ex_data_sketch

v0.5.0

11 Mar 10:50
c15d224

Choose a tag to compare

Release date: 2026-03-11

Summary

v0.5.0 adds six new structures for advanced membership testing and set
reconciliation, plus FilterChain for composing filters into lifecycle-tier
pipelines. This is the largest feature release yet, bringing ExDataSketch to
13 sketch types across seven categories.

ExDataSketch now covers:

Category Algorithms
Cardinality HyperLogLog (HLL)
Frequency Count-Min Sketch (CMS)
Set operations Theta Sketch
Quantiles KLL, DDSketch
Frequency ranking FrequentItems (SpaceSaving)
Membership Bloom, Cuckoo, Quotient, CQF, XorFilter
Set reconciliation IBLT
Composition FilterChain

What's new in v0.5.0

Cuckoo Filter (ExDataSketch.Cuckoo)

Membership testing with deletion support using partial-key cuckoo hashing.
Unlike Bloom filters, Cuckoo filters support safe deletion of previously
inserted items. Better space efficiency than Bloom at low false positive rates.

Key features:

  • Partial-key cuckoo hashing with configurable fingerprint size (8, 12, 16 bits)
  • Configurable bucket size (2 or 4 slots) and max kicks (100..2000)
  • Safe deletion without false negatives on subsequent queries
  • CKO1 binary state format
  • EXSK serialization (sketch ID 8)

Quick start:

cuckoo = ExDataSketch.Cuckoo.new(capacity: 100_000, fingerprint_size: 8)

{:ok, cuckoo} = ExDataSketch.Cuckoo.put(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42")  # true

{:ok, cuckoo} = ExDataSketch.Cuckoo.delete(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42")  # false

Quotient Filter (ExDataSketch.Quotient)

Membership testing with safe deletion and merge. Splits fingerprints into
quotient (slot index) and remainder (stored value) with metadata bits for
cluster tracking. Supports merge without re-hashing.

Key features:

  • Quotient/remainder fingerprint splitting with linear probing
  • Metadata bits (is_occupied, is_continuation, is_shifted) for cluster tracking
  • Safe deletion and merge support
  • QOT1 binary state format
  • EXSK serialization (sketch ID 9)

Quick start:

qf = ExDataSketch.Quotient.new(q: 16, r: 8)
qf = ExDataSketch.Quotient.put(qf, "item_a")
ExDataSketch.Quotient.member?(qf, "item_a")  # true

# Merge two quotient filters
merged = ExDataSketch.Quotient.merge(qf_a, qf_b)

Counting Quotient Filter (ExDataSketch.CQF)

Extends the quotient filter with variable-length counter encoding to answer
not just "is this present?" but "how many times was this inserted?" Counts are
approximate (never underestimated).

Key features:

  • Variable-length counter encoding within runs
  • estimate_count/2 for approximate multiplicity queries
  • Safe deletion (decrements count)
  • Merge sums counts across filters
  • CQF1 binary state format
  • EXSK serialization (sketch ID 10)

Quick start:

cqf = ExDataSketch.CQF.new(q: 16, r: 8)
cqf = ExDataSketch.CQF.put(cqf, "event_x")
cqf = ExDataSketch.CQF.put(cqf, "event_x")
cqf = ExDataSketch.CQF.put(cqf, "event_x")

ExDataSketch.CQF.estimate_count(cqf, "event_x")  # >= 3
ExDataSketch.CQF.member?(cqf, "event_x")          # true

XorFilter (ExDataSketch.XorFilter)

Static, immutable membership filter with the smallest footprint and fastest
lookups. All items must be provided at construction time via build/2.
No insertion, deletion, or merge -- query only.

Key features:

  • Build-once immutable construction via build/2
  • 8-bit or 16-bit fingerprints (configurable false positive rate)
  • Smallest memory footprint of all membership filters
  • XOR1 binary state format
  • EXSK serialization (sketch ID 11)

Quick start:

items = MapSet.new(1..100_000)
{:ok, xor} = ExDataSketch.XorFilter.build(items, fingerprint_bits: 8)

ExDataSketch.XorFilter.member?(xor, 42)      # true
ExDataSketch.XorFilter.member?(xor, 999_999) # false (probably)

IBLT (ExDataSketch.IBLT)

Invertible Bloom Lookup Table for set reconciliation. Two parties each build
an IBLT from their sets, exchange them, and subtract to discover only the
differing items -- without transmitting the full sets.

Key features:

  • Set mode (items only) and key-value mode
  • subtract/2 for cell-wise difference
  • list_entries/1 for peeling decoded entries (positive/negative sets)
  • Merge via cell-wise addition
  • IBL1 binary state format
  • EXSK serialization (sketch ID 12)

Quick start:

# Set reconciliation between two nodes
iblt_a = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_a)
iblt_b = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_b)

diff = ExDataSketch.IBLT.subtract(iblt_a, iblt_b)
{:ok, %{positive: only_in_a, negative: only_in_b}} = ExDataSketch.IBLT.list_entries(diff)

FilterChain (ExDataSketch.FilterChain)

Capability-aware composition framework for chaining membership filters into
lifecycle-tier pipelines. Enforces valid chain positions based on each
filter's capabilities.

Key features:

  • add_stage/2 with position validation (front/middle/terminal/adjunct)
  • put/2 fans out to all stages supporting insertion (skips static stages)
  • member?/2 queries stages in order, short-circuits on false
  • Lifecycle-tier patterns: hot Cuckoo (writes) -> cold XorFilter (snapshots)
  • IBLT stages placed as adjuncts (not in query path)
  • FCN1 binary state format

Quick start:

chain =
  ExDataSketch.FilterChain.new()
  |> ExDataSketch.FilterChain.add_stage(ExDataSketch.Cuckoo.new(capacity: 10_000))
  |> ExDataSketch.FilterChain.add_stage(ExDataSketch.Bloom.new(capacity: 100_000))

{:ok, chain} = ExDataSketch.FilterChain.put(chain, "item_1")
ExDataSketch.FilterChain.member?(chain, "item_1")  # true

Benchmarks

Benchmark suites added for all new structures:

  • bench/cuckoo_bench.exs
  • bench/quotient_bench.exs
  • bench/cqf_bench.exs
  • bench/xor_filter_bench.exs
  • bench/iblt_bench.exs
  • bench/filter_chain_bench.exs

Run all benchmarks with mix bench.

Algorithm Matrix

Algorithm Purpose EXSK ID Backend
HLL Cardinality estimation 1 Pure + Rust
CMS Frequency estimation 2 Pure + Rust
Theta Set operations 3 Pure + Rust
KLL Rank/quantile estimation 4 Pure + Rust
DDSketch Value-relative quantiles 5 Pure + Rust
FrequentItems Heavy-hitter detection 6 Pure + Rust
Bloom Membership testing 7 Pure
Cuckoo Membership with deletion 8 Pure
Quotient Membership with deletion/merge 9 Pure
CQF Multiset membership/counting 10 Pure
XorFilter Static membership testing 11 Pure
IBLT Set reconciliation 12 Pure
FilterChain Filter composition -- Pure

Installation

def deps do
  [
    {:ex_data_sketch, "~> 0.5.0"}
  ]
end

Precompiled Rust NIF binaries are downloaded automatically on supported
platforms (macOS ARM64/x86_64, Linux x86_64/aarch64 glibc/musl). No Rust
toolchain required. The library works in pure Elixir mode on all other
platforms.

Upgrade Notes

  • No breaking changes from v0.4.0.
  • EXSK binaries produced by earlier v0.x releases remain fully compatible.
  • The ExDataSketch.Backend behaviour now includes additional callbacks for
    Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain. Custom backend
    implementations must add these callbacks.
  • New error types: UnsupportedOperationError and InvalidChainCompositionError.
  • All membership filter modules now export capabilities/0 returning a MapSet
    of supported operations.

What's Next

  • v0.6.0 - Rust NIF acceleration for v0.5.0 structures, and advanced Quantiles
  • v0.7.0 - advanced Cardinality Sketches including CPC, HLL++, ULL (UltraLogLog), SetSketch
  • v0.8.0 scope is under discussion. Potential directions include Rust NIF
    acceleration for v0.5.0 structures, additional composition patterns, or
    new sketch types.

Links

v0.4.0

08 Mar 13:48
8192717

Choose a tag to compare

Full Changelog: v0.3.0...v0.4.0

What's new in v0.4.0

Bloom filter:

  • Automatic parameter derivation from capacity and false_positive_rate
  • Double hashing (Kirsch-Mitzenmacher) for k bit positions from one hash
  • BLM1 binary format (40-byte header + packed bitset)
  • Merge via bitwise OR
  • Capacity overflow validation for the u32 binary format constraints
  • 40 unit tests, 5 property tests, 2 statistical FPR validation tests
Algorithm What it does
HyperLogLog (HLL) Count distinct elements (~0.8% error, 16 KB)
Count-Min Sketch (CMS) Estimate item frequencies
Theta Sketch Set union/intersection cardinality
KLL Rank-accurate quantiles (median, P99)
DDSketch Value-relative quantiles (P99 latency +/- 1%)
FrequentItems Top-k most frequent items (SpaceSaving)
Bloom Filter "Have I seen this before?" with tunable FPR

ExDataSketch v0.3.0 -- FrequentItems / Heavy-Hitters

06 Mar 13:20
2b9ac09

Choose a tag to compare

ExDataSketch v0.3.0 -- FrequentItems / Heavy-Hitters

ExDataSketch v0.3.0 adds FrequentItems, a streaming heavy-hitter sketch based on the SpaceSaving algorithm. This is the sixth sketch family in the library and the first non-numeric sketch type.

What is FrequentItems?

FrequentItems tracks the approximate top-k most frequent items in a data stream using bounded memory. It maintains at most k counters, each storing an item, its estimated count, and a maximum overcount error. The SpaceSaving algorithm guarantees that any item whose true frequency exceeds N/k will always be tracked.

sketch =
  ExDataSketch.FrequentItems.new(k: 10)
  |> ExDataSketch.FrequentItems.update_many(stream_of_page_views)

# Get the top 5 most frequent items
ExDataSketch.FrequentItems.top_k(sketch, 5)
# [%{item: "/home", estimate: 4821, error: 12, lower: 4809, upper: 4821}, ...]

# Check a specific item
ExDataSketch.FrequentItems.estimate(sketch, "/checkout")
# {:ok, %{estimate: 312, error: 5, lower: 307, upper: 312}}

Highlights

Full Pure Elixir + Rust NIF dual-backend support

Like all ExDataSketch algorithms, FrequentItems ships with a pure Elixir implementation and optional Rust NIF acceleration. Both backends produce byte-identical serialized output for the same inputs.

Canonical FI1 binary state format

FrequentItems uses a 32-byte header followed by variable-length entries sorted by item bytes. The format is deterministic and portable across backends.

Flexible key encoding

Three key encoding modes are supported:

Encoding Use case
:binary (default) String keys, raw binary data
:int Integer keys (signed 64-bit little-endian)
{:term, :external} Arbitrary Erlang terms via :erlang.term_to_binary/1

Deterministic merge

FrequentItems merge is commutative. It combines counts additively across the union of keys, then replays weighted updates in sorted key order into an empty sketch. Count (n) is always exactly additive regardless of whether entries are dropped during capacity enforcement.

Smart NIF routing

Query operations (top_k, estimate) route to Rust only when k >= 256, where NIF acceleration outweighs the call boundary overhead. Header reads (count, entry_count) always use Pure Elixir since they are O(1) binary pattern matches.

Other changes

  • Rust NIF for theta_compact with dirty scheduler support.
  • Mox added as a test dependency for backend contract testing.
  • EXSK codec sketch ID 6 for FrequentItems.

Upgrading

def deps do
  [
    {:ex_data_sketch, "~> 0.3.0"}
  ]
end

No breaking changes from v0.2.1. All existing sketches (HLL, CMS, Theta, KLL, DDSketch) are unchanged.

Algorithm coverage

Algorithm Purpose Status
HyperLogLog (HLL) Cardinality estimation Pure + Rust
Count-Min Sketch (CMS) Frequency estimation Pure + Rust
Theta Sketch Set operations on cardinalities Pure + Rust
KLL Quantiles Rank and quantile estimation Pure + Rust
DDSketch Relative-error quantile estimation Pure + Rust
FrequentItems (SpaceSaving) Heavy-hitter detection Pure + Rust

v0.2.1

05 Mar 19:14
113a7bd

Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1

v0.2.0-alpha.1

05 Mar 13:44
cf32250

Choose a tag to compare

What's Changed

  • Deterministic vectors by @thanos in #76
  • Add KLL quantiles sketch with Pure Elixir and Rust NIF backends by @thanos in #86

Full Changelog: v0.1.0-alpha.12...v0.2.0-alpha.1

v0.2.0

05 Mar 13:45
cf32250

Choose a tag to compare

What's Changed

  • Deterministic vectors by @thanos in #76
  • Add KLL quantiles sketch with Pure Elixir and Rust NIF backends by @thanos in #86

Full Changelog: v0.1.0-alpha.12...v0.2.0

v0.1.0-alpha.9

03 Mar 16:10

Choose a tag to compare

v0.1.0-alpha.8

03 Mar 04:28

Choose a tag to compare

v0.1.0-alpha.7

03 Mar 03:01

Choose a tag to compare

v0.1.0-alpha.4

03 Mar 02:17

Choose a tag to compare

Add NIF version features to Cargo.toml

  Required since Rustler v0.30.0 for precompiled NIF builds. The
  rustler-precompiled-action selects the correct feature
  (nif_version_2_16, nif_version_2_17) via cargo --features flag.