-
Notifications
You must be signed in to change notification settings - Fork 11
Closed
Description
Summary
Add Go version t-digest.
Design
Double
Adding Double Precision t-digest. Because Go doesn’t support meta programming like C++ does, pairing (float32, uint32) and (float64, uint64) is difficult. In this time, i will add double precision only.
// Double provides an implementation of t-Digest double precision type for estimating quantiles and ranks.
// This implementation is based on the paper:
// Ted Dunning, Otmar Ertl. "Extremely Accurate Quantiles Using t-Digests"
// and the reference implementation: https://github.com/tdunning/t-digest
// NOTE: This implementation is similar to MergingDigest in the above Java implementation
type Double struct {
min float64
max float64
centroids []doublePrecisionCentroid
buffer []float64
centroidsWeight uint64
centroidsCapacity int
k uint16
reverseMerge bool
}
type doublePrecisionCentroid struct {
mean float64
weight uint64
}
func (c *doublePrecisionCentroid) add(other doublePrecisionCentroid) {
c.weight += other.weight
c.mean += (other.mean - c.mean) * float64(other.weight) / float64(c.weight)
}
// Update updates a value to the TDigest
func (d *Double) Update(value float64) error
// Merge merges another Double into this one
func (d *Double) Merge(other *Double) error
// IsEmpty returns true if the TDigest has not seen any data
func (d *Double) IsEmpty() bool
// MinValue returns the minimum value seen by the TDigest
func (d *Double) MinValue() (float64, error)
// MaxValue returns the maximum value seen by the TDigest
func (d *Double) MaxValue() (float64, error)
// TotalWeight returns the total weight of all values
func (d *Double) TotalWeight() uint64
// K returns the compression parameter k
func (d *Double) K() uint16
// Rank computes the approximate normalized rank of the given value
func (d *Double) Rank(value float64) (float64, error)
// Quantile computes the approximate quantile value corresponding to the given normalized rank
func (d *Double) Quantile(rank float64) (float64, error)
// PMF returns an approximation to the Probability Mass Function (PMF)
// of the input stream.
func (d *Double) PMF(splitPoints []float64) ([]float64, error)
// CDF returns an approximation to the Cumulative Distribution Function (CDF)
// which is the cumulative analog of the PMF of the input stream.
func (d *Double) CDF(splitPoints []float64) ([]float64, error)
// String returns a human-readable summary of the TDigest
func (d *Double) String(shouldPrintCentroids bool) string
// SerializedSizeBytes computes the serialized size in bytes of the TDigest.
func (d *Double) SerializedSizeBytes(withBuffer bool) intSerialization/Deserialization
- I will use Encoder/Decoder pattern like it was before.
Release Schedule
I sent a PR and discussion opens. When finish the PR, I will start upload implementation.
I will upload 2 PRs.
- A PR for double precision t-digest.
- A PR for and serialization / deserialization.
Metadata
Metadata
Assignees
Labels
No labels