Step 3: Pure Elixir Backend

### Step 3: Pure Elixir Backend

**Modify** `lib/ex_data_sketch/backend/pure.ex`

Binary state layout (ULL1, identical structure to HLL):
```
Offset  Size    Field
0       4       Header: version(u8=1) + p(u8) + reserved(u16 LE=0)
4       2^p     Registers (one u8 per register)
Total: 4 + 2^p bytes.
Magic: "ULL1"
```

Wait -- to maintain consistency with other sketches that use 4-byte magic prefixes (BLM1, CKO1, QOT1, etc.), use a 4-byte magic "ULL1" as the state header prefix:

```
Offset  Size    Field
0       4       Magic "ULL1"
4       1       Version (u8, 1)
5       1       Precision p (u8, 4..26)
6       2       Reserved (u16 LE, 0)
8       2^p     Registers (one u8 per register)
Total: 8 + 2^p bytes.
```

ULL register encoding from Ertl 2023:
- Given 64-bit hash `h` and precision `p`:
  - `bucket = h >>> (64 - p)` (same as HLL)
  - `w = h <<< p` (remaining bits, shifted left)
  - `geometric_rank = clz(w) + 1` (count leading zeros + 1, like HLL)
  - `sub_bit = bit at position (p + geometric_rank)` in original hash (the bit just after the leading zeros)
  - `register_value = 2 * geometric_rank - sub_bit` (encodes rank + sub-bucket in single byte)
  - Clamp to 0..255

Update rule: `new_reg = max(old_reg, register_value)` (same as HLL)

FGRA estimator:
- For each register value `r`, compute contribution `q_r`:
  - If `r == 0`: use `q_0 = (2^(1-p) * C0)` where `C0` is the small-range correction
  - Otherwise: `q_r = 2^(-floor((r+1)/2))` adjusted by the sub-bucket bit
- `raw_estimate = alpha_m * m^2 / sum(q_r)`
- Apply bias correction table (pre-computed for each p value)

The exact FGRA formulas from the paper:
```
tau(x) = (1 - x) * sum_{k>=1} (1 - x^(2^k)) / 2^k   [converges quickly]
sigma(x) = x * sum_{k>=1} x^(2^k) / (1 + x^(2^k))   [converges quickly]

estimate = alpha_inf * m^2 / (m * sigma(C0/m) + sum_{r=1..q} C_r * 2^(-r) + m * tau((m - C_q)/m))
```
where `C_r` is the count of registers with value `r`, `q = max register value`, and `alpha_inf = 1 / (2 * ln(2))`.

Note: The FGRA estimator is actually the "new HLL" estimator from Ertl's earlier 2017 paper, applied to ULL register values. The sigma and tau functions converge in ~50 iterations.

**Functions to implement**:
- `ull_new/1` -- allocate header + zeroed registers
- `ull_update/3` -- compute register value from hash, max-update
- `ull_update_many/3` -- tuple-based bulk update (same pattern as `hll_update_many`)
- `ull_merge/3` -- register-wise max (reuse `zip_max_binary` helper from HLL)
- `ull_estimate/3` -- FGRA estimator with sigma/tau helper functions



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 3: Pure Elixir Backend #181