Skip to content

[FEATURE] t-digest for go #96

@proost

Description

@proost

Summary

Add Go version t-digest.

Design

Double

Adding Double Precision t-digest. Because Go doesn’t support meta programming like C++ does, pairing (float32, uint32) and (float64, uint64) is difficult. In this time, i will add double precision only.

// Double provides an implementation of t-Digest double precision type for estimating quantiles and ranks.
// This implementation is based on the paper:
// Ted Dunning, Otmar Ertl. "Extremely Accurate Quantiles Using t-Digests"
// and the reference implementation: https://github.com/tdunning/t-digest
// NOTE: This implementation is similar to MergingDigest in the above Java implementation
type Double struct {
	min               float64
	max               float64
	centroids         []doublePrecisionCentroid
	buffer            []float64
	centroidsWeight   uint64
	centroidsCapacity int
	k                 uint16
	reverseMerge      bool
}

type doublePrecisionCentroid struct {
	mean   float64
	weight uint64
}

func (c *doublePrecisionCentroid) add(other doublePrecisionCentroid) {
	c.weight += other.weight
	c.mean += (other.mean - c.mean) * float64(other.weight) / float64(c.weight)
}

// Update updates a value to the TDigest
func (d *Double) Update(value float64) error

// Merge merges another Double into this one
func (d *Double) Merge(other *Double) error

// IsEmpty returns true if the TDigest has not seen any data
func (d *Double) IsEmpty() bool

// MinValue returns the minimum value seen by the TDigest
func (d *Double) MinValue() (float64, error)

// MaxValue returns the maximum value seen by the TDigest
func (d *Double) MaxValue() (float64, error)

// TotalWeight returns the total weight of all values
func (d *Double) TotalWeight() uint64

// K returns the compression parameter k
func (d *Double) K() uint16

// Rank computes the approximate normalized rank of the given value
func (d *Double) Rank(value float64) (float64, error)

// Quantile computes the approximate quantile value corresponding to the given normalized rank
func (d *Double) Quantile(rank float64) (float64, error)

// PMF returns an approximation to the Probability Mass Function (PMF)
// of the input stream.
func (d *Double) PMF(splitPoints []float64) ([]float64, error)

// CDF returns an approximation to the Cumulative Distribution Function (CDF)
// which is the cumulative analog of the PMF of the input stream.
func (d *Double) CDF(splitPoints []float64) ([]float64, error)

// String returns a human-readable summary of the TDigest
func (d *Double) String(shouldPrintCentroids bool) string

// SerializedSizeBytes computes the serialized size in bytes of the TDigest.
func (d *Double) SerializedSizeBytes(withBuffer bool) int

Serialization/Deserialization

  • I will use Encoder/Decoder pattern like it was before.

Release Schedule

I sent a PR and discussion opens. When finish the PR, I will start upload implementation.

I will upload 2 PRs.

  1. A PR for double precision t-digest.
  2. A PR for and serialization / deserialization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions