Skip to content

Milestones

List view

  • ## Context v0.6.0 is complete with 16 sketch types across eight categories. The README roadmap targets v0.7.0 for ULL (UltraLogLog). The user asked to evaluate CPC, HLL++, ULL, and SetSketch for v0.7.0. **Recommendation**: Implement **ULL** (UltraLogLog) only. Defer HLL++, SetSketch, and CPC. **Rationale**: - **ULL** provides the highest value: better accuracy per byte than HLL (same register format, superior FGRA estimator from Ertl 2023). It directly improves the library's core cardinality estimation offering. Same 2^p register layout makes implementation tractable. - **HLL++ deferred**: The Google HLL++ sparse mode optimization is valuable but adds significant complexity (dual representation, mode transitions, sorted sparse list encoding). Better as a v0.8.0 feature after ULL establishes the improved estimation baseline. - **SetSketch deferred**: Jaccard similarity is a different use case (set similarity, not cardinality). Better suited as a standalone v0.9.0 addition. - **CPC deferred indefinitely**: Apache DataSketches CPC has extreme implementation complexity (compressed surprising values, flavor system, sliding windows). The accuracy improvement over ULL is marginal and not worth the engineering cost. --- ## UltraLogLog (ULL) Overview **Paper**: Ertl, "UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog" (2023) **Key differences from HLL**: - Uses the same `2^p` register array, but stores a different value per register - Register value `r` encodes both the geometric rank (like HLL) and an extra "sub-bucket" bit extracted from the hash, doubling the information per register - The FGRA (Flajolet-Goldstein-Rauzy-Ating) estimator replaces HLL's harmonic mean, providing ~20% better accuracy at the same memory - Registers are 8-bit (same as HLL), so memory footprint is identical: `4 + 2^p` bytes **Operations**: `new`, `update`, `update_many`, `merge`, `estimate`, `serialize`, `deserialize` **Merge**: Register-wise `max` (identical to HLL merge semantics)

    Due by March 14, 2026
    0/9 issues closed
  • Due by March 13, 2026