Why do we recommend to disable "preferSortMergeJoin"? #6133
Replies: 2 comments 1 reply
-
|
@zhouyuan @z123 @PHILO-HE Do you have information to share? e.g. Do you use SortMergeJoin or ShuffledHashJoin in production? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, @xumingming I didn't have much information the production env, but for functionality and performance in Gluten/Velox - Hash Join is better. We are also improving the merge join code path in Velox recently but still requires more tests and validations from Gluten users. thanks, -yuan |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
https://github.com/apache/incubator-gluten/blob/800cadd0f4f71d0ebedb5fbf6428442ae52b77ac/docs/Configuration.md?plain=1#L21
Just curious, why do we recommend to disable
preferSortMergeJoin? Do we have some kind of benchmark result? Would be great if you can share the benchmark results 👍The reason I ask this is that Spark claims SortMergeJoin works better for large tables:
Beta Was this translation helpful? Give feedback.
All reactions