[GLUTEN-4889][VL] feat: Support approx_percentile aggregate function#11651
[GLUTEN-4889][VL] feat: Support approx_percentile aggregate function#11651Yizhou-Yang wants to merge 17 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Please update get-velox.sh to test your PR, then you can verify if both can work well, you may update this line |
|
Do we need the config? Usually we offload the function to native by default |
added the 16320 and removed the config |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
61dc2ca to
c76cb84
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
5 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Please update the PR description to describe the KLL Sketch is different so that we handle fallback separately. |
done~ |
1b402de to
b0679ad
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
…rt VeloxApproximatePercentile to DeclarativeAggregate with KLL sketch
…ckend ApproximatePercentile registration (Velox PR only)
62067d6 to
d14b740
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
…ministic across Spark and Velox implementations
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
There are test failures in x86 environment non-slow modes. edit: |
|
@Yizhou-Yang Thanks for your implementation! Recently I am cherrying pick your PR and I find that the KLL implementation in Gluten has only one level and discards items in odd position when compacting. I am wondering if this implementation can meet the |
yes, this is part of the reason for the ut failures. The gluten version of kll is written rather hastily and was primarily concerned with simplicity and fallback compatibiltity with velox, and now it seems to have caused problems including but not limited to ut failures and lots of off-by-one issues. I am editing this implementation and doing a line-by-line comparison with the velox one to ensure the gluten side quality, and enhanced unit tests needs to be performed. Although after all this, there will still be some existing uts that might need to be excluded. |
What
Add Velox
approx_percentilesupport for Spark.Why
Velox uses KLL sketch while Spark uses GK algorithm — their intermediate data formats are incompatible (KLL: 9-field StructType vs GK: single BinaryType buffer). This means fallback between Velox and Spark requires separate handling.
How
VeloxApproximatePercentile: ADeclarativeAggregatewith 9aggBufferAttributesmatching Velox's KLL sketch layout.KllSketchHelper/KllSketchAdd/KllSketchMerge/KllSketchEval): Simplified KLL operations for fallback, binary-compatible with Velox's C++ accumulator.ApproxPercentileRewriteRule: Rewrites Spark'sApproximatePercentileto the Velox-compatible version.Key decisions
IntegerType(Spark's original value); Velox computesepsilon = 1.0/accuracyinternally.Velox dependency
facebookincubator/velox#16320
Related issue: #4889