-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Step 3: Pure Elixir Backend
Modify lib/ex_data_sketch/backend/pure.ex
Binary state layout (ULL1, identical structure to HLL):
Offset Size Field
0 4 Header: version(u8=1) + p(u8) + reserved(u16 LE=0)
4 2^p Registers (one u8 per register)
Total: 4 + 2^p bytes.
Magic: "ULL1"
Wait -- to maintain consistency with other sketches that use 4-byte magic prefixes (BLM1, CKO1, QOT1, etc.), use a 4-byte magic "ULL1" as the state header prefix:
Offset Size Field
0 4 Magic "ULL1"
4 1 Version (u8, 1)
5 1 Precision p (u8, 4..26)
6 2 Reserved (u16 LE, 0)
8 2^p Registers (one u8 per register)
Total: 8 + 2^p bytes.
ULL register encoding from Ertl 2023:
- Given 64-bit hash
hand precisionp:bucket = h >>> (64 - p)(same as HLL)w = h <<< p(remaining bits, shifted left)geometric_rank = clz(w) + 1(count leading zeros + 1, like HLL)sub_bit = bit at position (p + geometric_rank)in original hash (the bit just after the leading zeros)register_value = 2 * geometric_rank - sub_bit(encodes rank + sub-bucket in single byte)- Clamp to 0..255
Update rule: new_reg = max(old_reg, register_value) (same as HLL)
FGRA estimator:
- For each register value
r, compute contributionq_r:- If
r == 0: useq_0 = (2^(1-p) * C0)whereC0is the small-range correction - Otherwise:
q_r = 2^(-floor((r+1)/2))adjusted by the sub-bucket bit
- If
raw_estimate = alpha_m * m^2 / sum(q_r)- Apply bias correction table (pre-computed for each p value)
The exact FGRA formulas from the paper:
tau(x) = (1 - x) * sum_{k>=1} (1 - x^(2^k)) / 2^k [converges quickly]
sigma(x) = x * sum_{k>=1} x^(2^k) / (1 + x^(2^k)) [converges quickly]
estimate = alpha_inf * m^2 / (m * sigma(C0/m) + sum_{r=1..q} C_r * 2^(-r) + m * tau((m - C_q)/m))
where C_r is the count of registers with value r, q = max register value, and alpha_inf = 1 / (2 * ln(2)).
Note: The FGRA estimator is actually the "new HLL" estimator from Ertl's earlier 2017 paper, applied to ULL register values. The sigma and tau functions converge in ~50 iterations.
Functions to implement:
ull_new/1-- allocate header + zeroed registersull_update/3-- compute register value from hash, max-updateull_update_many/3-- tuple-based bulk update (same pattern ashll_update_many)ull_merge/3-- register-wise max (reusezip_max_binaryhelper from HLL)ull_estimate/3-- FGRA estimator with sigma/tau helper functions
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels