Releases · thanos/ex_data_sketch

11 Mar 10:50

github-actions

v0.5.0

c15d224

v0.5.0 Latest

Latest

Release date: 2026-03-11

Summary

v0.5.0 adds six new structures for advanced membership testing and set
reconciliation, plus FilterChain for composing filters into lifecycle-tier
pipelines. This is the largest feature release yet, bringing ExDataSketch to
13 sketch types across seven categories.

ExDataSketch now covers:

Category	Algorithms
Cardinality	HyperLogLog (HLL)
Frequency	Count-Min Sketch (CMS)
Set operations	Theta Sketch
Quantiles	KLL, DDSketch
Frequency ranking	FrequentItems (SpaceSaving)
Membership	Bloom, Cuckoo, Quotient, CQF, XorFilter
Set reconciliation	IBLT
Composition	FilterChain

What's new in v0.5.0

Cuckoo Filter (`ExDataSketch.Cuckoo`)

Membership testing with deletion support using partial-key cuckoo hashing.
Unlike Bloom filters, Cuckoo filters support safe deletion of previously
inserted items. Better space efficiency than Bloom at low false positive rates.

Key features:

Partial-key cuckoo hashing with configurable fingerprint size (8, 12, 16 bits)
Configurable bucket size (2 or 4 slots) and max kicks (100..2000)
Safe deletion without false negatives on subsequent queries
CKO1 binary state format
EXSK serialization (sketch ID 8)

Quick start:

cuckoo = ExDataSketch.Cuckoo.new(capacity: 100_000, fingerprint_size: 8)

{:ok, cuckoo} = ExDataSketch.Cuckoo.put(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42")  # true

{:ok, cuckoo} = ExDataSketch.Cuckoo.delete(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42")  # false

Quotient Filter (`ExDataSketch.Quotient`)

Membership testing with safe deletion and merge. Splits fingerprints into
quotient (slot index) and remainder (stored value) with metadata bits for
cluster tracking. Supports merge without re-hashing.

Key features:

Quotient/remainder fingerprint splitting with linear probing
Metadata bits (is_occupied, is_continuation, is_shifted) for cluster tracking
Safe deletion and merge support
QOT1 binary state format
EXSK serialization (sketch ID 9)

Quick start:

qf = ExDataSketch.Quotient.new(q: 16, r: 8)
qf = ExDataSketch.Quotient.put(qf, "item_a")
ExDataSketch.Quotient.member?(qf, "item_a")  # true

# Merge two quotient filters
merged = ExDataSketch.Quotient.merge(qf_a, qf_b)

Counting Quotient Filter (`ExDataSketch.CQF`)

Extends the quotient filter with variable-length counter encoding to answer
not just "is this present?" but "how many times was this inserted?" Counts are
approximate (never underestimated).

Key features:

Variable-length counter encoding within runs
estimate_count/2 for approximate multiplicity queries
Safe deletion (decrements count)
Merge sums counts across filters
CQF1 binary state format
EXSK serialization (sketch ID 10)

Quick start:

cqf = ExDataSketch.CQF.new(q: 16, r: 8)
cqf = ExDataSketch.CQF.put(cqf, "event_x")
cqf = ExDataSketch.CQF.put(cqf, "event_x")
cqf = ExDataSketch.CQF.put(cqf, "event_x")

ExDataSketch.CQF.estimate_count(cqf, "event_x")  # >= 3
ExDataSketch.CQF.member?(cqf, "event_x")          # true

XorFilter (`ExDataSketch.XorFilter`)

Static, immutable membership filter with the smallest footprint and fastest
lookups. All items must be provided at construction time via build/2.
No insertion, deletion, or merge -- query only.

Key features:

Build-once immutable construction via build/2
8-bit or 16-bit fingerprints (configurable false positive rate)
Smallest memory footprint of all membership filters
XOR1 binary state format
EXSK serialization (sketch ID 11)

Quick start:

items = MapSet.new(1..100_000)
{:ok, xor} = ExDataSketch.XorFilter.build(items, fingerprint_bits: 8)

ExDataSketch.XorFilter.member?(xor, 42)      # true
ExDataSketch.XorFilter.member?(xor, 999_999) # false (probably)

IBLT (`ExDataSketch.IBLT`)

Invertible Bloom Lookup Table for set reconciliation. Two parties each build
an IBLT from their sets, exchange them, and subtract to discover only the
differing items -- without transmitting the full sets.

Key features:

Set mode (items only) and key-value mode
subtract/2 for cell-wise difference
list_entries/1 for peeling decoded entries (positive/negative sets)
Merge via cell-wise addition
IBL1 binary state format
EXSK serialization (sketch ID 12)

Quick start:

# Set reconciliation between two nodes
iblt_a = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_a)
iblt_b = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_b)

diff = ExDataSketch.IBLT.subtract(iblt_a, iblt_b)
{:ok, %{positive: only_in_a, negative: only_in_b}} = ExDataSketch.IBLT.list_entries(diff)

FilterChain (`ExDataSketch.FilterChain`)

Capability-aware composition framework for chaining membership filters into
lifecycle-tier pipelines. Enforces valid chain positions based on each
filter's capabilities.

Key features:

add_stage/2 with position validation (front/middle/terminal/adjunct)
put/2 fans out to all stages supporting insertion (skips static stages)
member?/2 queries stages in order, short-circuits on false
Lifecycle-tier patterns: hot Cuckoo (writes) -> cold XorFilter (snapshots)
IBLT stages placed as adjuncts (not in query path)
FCN1 binary state format

Quick start:

chain =
  ExDataSketch.FilterChain.new()
  |> ExDataSketch.FilterChain.add_stage(ExDataSketch.Cuckoo.new(capacity: 10_000))
  |> ExDataSketch.FilterChain.add_stage(ExDataSketch.Bloom.new(capacity: 100_000))

{:ok, chain} = ExDataSketch.FilterChain.put(chain, "item_1")
ExDataSketch.FilterChain.member?(chain, "item_1")  # true

Benchmarks

Benchmark suites added for all new structures:

bench/cuckoo_bench.exs
bench/quotient_bench.exs
bench/cqf_bench.exs
bench/xor_filter_bench.exs
bench/iblt_bench.exs
bench/filter_chain_bench.exs

Run all benchmarks with mix bench.

Algorithm Matrix

Algorithm	Purpose	EXSK ID	Backend
HLL	Cardinality estimation	1	Pure + Rust
CMS	Frequency estimation	2	Pure + Rust
Theta	Set operations	3	Pure + Rust
KLL	Rank/quantile estimation	4	Pure + Rust
DDSketch	Value-relative quantiles	5	Pure + Rust
FrequentItems	Heavy-hitter detection	6	Pure + Rust
Bloom	Membership testing	7	Pure
Cuckoo	Membership with deletion	8	Pure
Quotient	Membership with deletion/merge	9	Pure
CQF	Multiset membership/counting	10	Pure
XorFilter	Static membership testing	11	Pure
IBLT	Set reconciliation	12	Pure
FilterChain	Filter composition	--	Pure

Installation

def deps do
  [
    {:ex_data_sketch, "~> 0.5.0"}
  ]
end

Precompiled Rust NIF binaries are downloaded automatically on supported
platforms (macOS ARM64/x86_64, Linux x86_64/aarch64 glibc/musl). No Rust
toolchain required. The library works in pure Elixir mode on all other
platforms.

Upgrade Notes

No breaking changes from v0.4.0.
EXSK binaries produced by earlier v0.x releases remain fully compatible.
The ExDataSketch.Backend behaviour now includes additional callbacks for
Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain. Custom backend
implementations must add these callbacks.
New error types: UnsupportedOperationError and InvalidChainCompositionError.
All membership filter modules now export capabilities/0 returning a MapSet
of supported operations.

What's Next

v0.6.0 - Rust NIF acceleration for v0.5.0 structures, and advanced Quantiles
v0.7.0 - advanced Cardinality Sketches including CPC, HLL++, ULL (UltraLogLog), SetSketch
v0.8.0 scope is under discussion. Potential directions include Rust NIF
acceleration for v0.5.0 structures, additional composition patterns, or
new sketch types.

What's new in v0.4.0

Bloom filter:

Automatic parameter derivation from capacity and false_positive_rate
Double hashing (Kirsch-Mitzenmacher) for k bit positions from one hash
BLM1 binary format (40-byte header + packed bitset)
Merge via bitwise OR
Capacity overflow validation for the u32 binary format constraints
40 unit tests, 5 property tests, 2 statistical FPR validation tests

Algorithm	What it does
HyperLogLog (HLL)	Count distinct elements (~0.8% error, 16 KB)
Count-Min Sketch (CMS)	Estimate item frequencies
Theta Sketch	Set union/intersection cardinality
KLL	Rank-accurate quantiles (median, P99)
DDSketch	Value-relative quantiles (P99 latency +/- 1%)
FrequentItems	Top-k most frequent items (SpaceSaving)
Bloom Filter	"Have I seen this before?" with tunable FPR

Assets 14

06 Mar 13:20

github-actions

v0.3.0

2b9ac09

ExDataSketch v0.3.0 -- FrequentItems / Heavy-Hitters

ExDataSketch v0.3.0 adds FrequentItems, a streaming heavy-hitter sketch based on the SpaceSaving algorithm. This is the sixth sketch family in the library and the first non-numeric sketch type.

What is FrequentItems?

FrequentItems tracks the approximate top-k most frequent items in a data stream using bounded memory. It maintains at most k counters, each storing an item, its estimated count, and a maximum overcount error. The SpaceSaving algorithm guarantees that any item whose true frequency exceeds N/k will always be tracked.

sketch =
  ExDataSketch.FrequentItems.new(k: 10)
  |> ExDataSketch.FrequentItems.update_many(stream_of_page_views)

# Get the top 5 most frequent items
ExDataSketch.FrequentItems.top_k(sketch, 5)
# [%{item: "/home", estimate: 4821, error: 12, lower: 4809, upper: 4821}, ...]

# Check a specific item
ExDataSketch.FrequentItems.estimate(sketch, "/checkout")
# {:ok, %{estimate: 312, error: 5, lower: 307, upper: 312}}

Highlights

Full Pure Elixir + Rust NIF dual-backend support

Like all ExDataSketch algorithms, FrequentItems ships with a pure Elixir implementation and optional Rust NIF acceleration. Both backends produce byte-identical serialized output for the same inputs.

Canonical FI1 binary state format

FrequentItems uses a 32-byte header followed by variable-length entries sorted by item bytes. The format is deterministic and portable across backends.

Flexible key encoding

Three key encoding modes are supported:

Encoding	Use case
`:binary` (default)	String keys, raw binary data
`:int`	Integer keys (signed 64-bit little-endian)
`{:term, :external}`	Arbitrary Erlang terms via `:erlang.term_to_binary/1`

Deterministic merge

FrequentItems merge is commutative. It combines counts additively across the union of keys, then replays weighted updates in sorted key order into an empty sketch. Count (n) is always exactly additive regardless of whether entries are dropped during capacity enforcement.

Smart NIF routing

Query operations (top_k, estimate) route to Rust only when k >= 256, where NIF acceleration outweighs the call boundary overhead. Header reads (count, entry_count) always use Pure Elixir since they are O(1) binary pattern matches.

Other changes

Rust NIF for theta_compact with dirty scheduler support.
Mox added as a test dependency for backend contract testing.
EXSK codec sketch ID 6 for FrequentItems.

Upgrading

def deps do
  [
    {:ex_data_sketch, "~> 0.3.0"}
  ]
end

No breaking changes from v0.2.1. All existing sketches (HLL, CMS, Theta, KLL, DDSketch) are unchanged.

Algorithm coverage

Algorithm	Purpose	Status
HyperLogLog (HLL)	Cardinality estimation	Pure + Rust
Count-Min Sketch (CMS)	Frequency estimation	Pure + Rust
Theta Sketch	Set operations on cardinalities	Pure + Rust
KLL Quantiles	Rank and quantile estimation	Pure + Rust
DDSketch	Relative-error quantile estimation	Pure + Rust
FrequentItems (SpaceSaving)	Heavy-hitter detection	Pure + Rust

Assets 14

05 Mar 19:14

github-actions

v0.2.1

113a7bd

v0.2.1

What's Changed

V0.2.1/dd sketch quantiles by @thanos in #101

Full Changelog: v0.2.0...v0.2.1

Contributors

thanos

Assets 14

05 Mar 13:44

github-actions

v0.2.0-alpha.1

cf32250

v0.2.0-alpha.1

What's Changed

Deterministic vectors by @thanos in #76
Add KLL quantiles sketch with Pure Elixir and Rust NIF backends by @thanos in #86

Full Changelog: v0.1.0-alpha.12...v0.2.0-alpha.1

Contributors

thanos

Assets 14

05 Mar 13:45

github-actions

v0.2.0

cf32250

v0.2.0

What's Changed

Deterministic vectors by @thanos in #76
Add KLL quantiles sketch with Pure Elixir and Rust NIF backends by @thanos in #86

Full Changelog: v0.1.0-alpha.12...v0.2.0

Contributors

thanos

Assets 14

03 Mar 16:10

github-actions

v0.1.0-alpha.9

befb4cd

v0.1.0-alpha.9

Full Changelog: v0.1.0-alpha.8...v0.1.0-alpha.9

Assets 14

03 Mar 04:28

github-actions

v0.1.0-alpha.8

9a78b29

v0.1.0-alpha.8

Full Changelog: v0.1.0-alpha.7...v0.1.0-alpha.8

Assets 14

03 Mar 03:01

github-actions

v0.1.0-alpha.7

b96e279

v0.1.0-alpha.7

Full Changelog: v0.1.0-alpha.3...v0.1.0-alpha.7

Assets 14

03 Mar 02:17

github-actions

v0.1.0-alpha.4

3664bcf

v0.1.0-alpha.4

Add NIF version features to Cargo.toml

  Required since Rustler v0.30.0 for precompiled NIF builds. The
  rustler-precompiled-action selects the correct feature
  (nif_version_2_16, nif_version_2_17) via cargo --features flag.

Assets 6

Releases: thanos/ex_data_sketch

v0.5.0

Summary

What's new in v0.5.0

Cuckoo Filter (ExDataSketch.Cuckoo)

Quotient Filter (ExDataSketch.Quotient)

Counting Quotient Filter (ExDataSketch.CQF)

XorFilter (ExDataSketch.XorFilter)

IBLT (ExDataSketch.IBLT)

FilterChain (ExDataSketch.FilterChain)

Benchmarks

Algorithm Matrix

Installation

Upgrade Notes

What's Next

Links

Uh oh!

v0.4.0

What's new in v0.4.0

Uh oh!

ExDataSketch v0.3.0 -- FrequentItems / Heavy-Hitters

ExDataSketch v0.3.0 -- FrequentItems / Heavy-Hitters

What is FrequentItems?

Highlights

Full Pure Elixir + Rust NIF dual-backend support

Canonical FI1 binary state format

Flexible key encoding

Deterministic merge

Smart NIF routing

Other changes

Upgrading

Algorithm coverage

Uh oh!

v0.2.1

What's Changed

Contributors

Uh oh!

v0.2.0-alpha.1

What's Changed

Contributors

Uh oh!

v0.2.0

What's Changed

Contributors

Uh oh!

v0.1.0-alpha.9

Uh oh!

v0.1.0-alpha.8

Uh oh!

v0.1.0-alpha.7

Uh oh!

v0.1.0-alpha.4

Uh oh!

Cuckoo Filter (`ExDataSketch.Cuckoo`)

Quotient Filter (`ExDataSketch.Quotient`)

Counting Quotient Filter (`ExDataSketch.CQF`)

XorFilter (`ExDataSketch.XorFilter`)

IBLT (`ExDataSketch.IBLT`)

FilterChain (`ExDataSketch.FilterChain`)