[GLUTEN-10660][VL] Adding configuration for hash table build#10634
[GLUTEN-10660][VL] Adding configuration for hash table build#10634zhouyuan merged 6 commits intoapache:mainfrom
Conversation
718068d to
e3578e4
Compare
|
We need to set spark.gluten.velox.abandonbuild.noduphashminpct=100 to verify the behavior when deduplication is enabled. |
238f97b to
6682628
Compare
5294814 to
02acee6
Compare
|
After the testing, should we enable this feature by default? |
|
@liujiayi771 I vote for enabling this by default. The staging patch in our Velox fork is not updated. We will update on that after facebookincubator/velox#7066 merged |
|
At least for TPCDS test, I didn't see any performance regression with |
|
TPCDS has very few queries that contain semi/anti joins, and all of these involve duplicate join keys. |
backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala
Show resolved
Hide resolved
|
|
||
| val VELOX_HASHMAP_ABANDON_BUILD_DUPHASH_MIN_PCT = | ||
| buildConf("spark.gluten.velox.abandonbuild.noduphashminpct") | ||
| .internal() |
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
7c1a2b6 to
f030c73
Compare
Signed-off-by: Yuan <yuanzhou@apache.org>
|
@liujiayi771 are you suggesting it's better to disable this feature by default? |
| val VELOX_HASHMAP_ABANDON_BUILD_DUPHASH_MIN_PCT = | ||
| buildConf("spark.gluten.velox.abandonbuild.noduphashminpct") | ||
| .experimental() | ||
| .doc("Experimental: abandon hashmap build if duplicated rows are more than this pct.") |
There was a problem hiding this comment.
Perhaps use "percentile" to avoid abbreviations in the doc.
Yes, for the Q95 build side join keys, the duplication rate is very high, so the optimization is very significant. However, for production jobs without a high duplication rate, there could be a performance regression. It's necessary to adjust the abandon percent based on the specific job conditions. |
Signed-off-by: Yuan <yuanzhou@apache.org>
4f5bb36 to
0b3bee5
Compare
Signed-off-by: Yuan <yuanzhou@apache.org>
What changes are proposed in this pull request?
Adding configurations for hash map build optimization in facebookincubator/velox#7066
How was this patch tested?