[GLUTEN-7548][VL] Optimize BHJ in velox backend by JkSelf · Pull Request #8931 · apache/gluten

JkSelf · 2025-03-07T09:56:19Z

What changes were proposed in this pull request?

This PR implements the optimization from the CK backend in the BHJ to the Velox backend, ensuring that the hash table is built only once per executor. The detailed design document can be found here.

This PR enhances performance by 1.29x in 3TB TPC-DS Q23a when compared to using broadcast thresholds of 100MB and 10MB. Additionally, it addresses the out-of-memory (OOM) issue encountered in Q24a and Q24b with a 100MB threshold.
However, there is still potential for further optimization by implementing concurrent hash table building to mitigate the performance degradation observed in Q67. We will continue to pursue further optimizations in subsequent PRs.

Note: This PR eliminates the need for #5563

How was this patch tested?

Existing tests

github-actions · 2025-03-07T09:56:37Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2025-03-07T09:56:52Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-03-07T09:57:44Z

#7548

github-actions · 2025-03-07T09:59:33Z

Run Gluten Clickhouse CI on x86

FelixYBW · 2025-03-08T06:57:46Z

In long term, we need to implement the Spark way. Broadcast hashtable instead of raw table data.

JkSelf · 2025-03-10T02:02:12Z

In long term, we need to implement the Spark way. Broadcast hashtable instead of raw table data.

@FelixYBW Yes, we will support broadcasting the hash table approach after adding serialization/deserialization support to Velox's HashTable.

github-actions · 2025-03-10T02:12:23Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-03-10T02:31:04Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-03-10T06:06:18Z

Run Gluten Clickhouse CI on x86

FelixYBW · 2025-03-10T21:28:51Z

@zhztheplayer Is there memory management issue in this solution? Is the memory allocated in storage memory?

@JkSelf will this solution helpful to the final solution?

JkSelf · 2025-03-11T01:41:48Z

@JkSelf will this solution helpful to the final solution?

@FelixYBW Yes, the primary difference between Design 1 and Design 2 is the need for serialization and deserialization of Velox's hash table. Most of the remaining code can be shared between the two designs. We will evaluate the TPCH performance after addressing the result mismatch issue. If the performance does not meet expectations, we will proceed with implementing Design 2.

github-actions · 2025-03-12T10:56:58Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-03-12T11:38:37Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-03-13T03:27:38Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-03-18T11:57:43Z

Run Gluten Clickhouse CI on x86

zhztheplayer · 2025-03-18T15:01:19Z

Any test code to cover all of our changes to broadcast code? Given the tests are mostly based on Spark local mode where no broadcasting will actually happen.

cc @zjuwangg

JkSelf · 2025-03-19T00:55:11Z

@zhztheplayer Thank you for your review. It appears that the existing tests cover the broadcast changes introduced in this PR. I have added logging to the native broadcast hash table build code, and these logs will be printed during the CI process.

github-actions · 2026-03-09T11:02:52Z

Run Gluten Clickhouse CI on x86

JkSelf · 2026-03-09T11:19:37Z

@jinchengchenghh @liujiayi771 @zhztheplayer I'm merging this PR now to unblock our internal tests. I'll follow up with another PR to resolve your remaining comments. Thanks for the comments!

zhztheplayer

Some non-blocking comments.

backends-velox/src/main/scala/org/apache/spark/rpc/GlutenRpcMessages.scala

zhztheplayer · 2026-03-09T11:37:06Z

cpp/velox/jni/VeloxJniWrapper.cc

+          isNullAwareAntiJoin,
+          bloomFilterPushdownSize,
+          threadBatches,
+          defaultLeafVeloxMemoryPool());


defaultLeafVeloxMemoryPool should already be counted into off-heap memory pool, but I suggest to verify it if you'd like to.

...x/src/main/scala/org/apache/spark/sql/execution/unsafe/UnsafeColumnarBuildSideRelation.scala

github-actions bot added CORE works for Gluten Core BUILD VELOX labels Mar 7, 2025

JkSelf changed the title ~~[VL][test] test bhj optimization~~ [GLUTEN-7548][VL][test] test bhj optimization Mar 7, 2025

JkSelf force-pushed the bhj-optimization-1 branch from a6e8905 to 2b90118 Compare March 7, 2025 09:58

JkSelf changed the title ~~[GLUTEN-7548][VL][test] test bhj optimization~~ [GLUTEN-7548][VL][test] Optimize BHJ in velox backend Mar 7, 2025

JkSelf force-pushed the bhj-optimization-1 branch from 2b90118 to 8e677fa Compare March 10, 2025 02:11

JkSelf force-pushed the bhj-optimization-1 branch from 8e677fa to 8fed157 Compare March 10, 2025 02:30

FelixYBW mentioned this pull request Mar 10, 2025

[VL] Broadcast BHJ hash table #8857

Open

JkSelf marked this pull request as draft March 11, 2025 01:41

JkSelf force-pushed the bhj-optimization-1 branch from e7100cf to 7606ed3 Compare March 12, 2025 10:56

JkSelf force-pushed the bhj-optimization-1 branch from 880ef19 to d2b9c03 Compare March 13, 2025 03:27

JkSelf force-pushed the bhj-optimization-1 branch from d2b9c03 to fd92027 Compare March 18, 2025 11:57

Ke Jia and others added 25 commits March 9, 2026 03:42

code refactor

9d22019

Resolved comments

68d533f

Resolve comments

1c7f8ae

fix

3bacd05

Move the HashTableBuilder file into gluten cpp

b27ee9f

Fix failed unit test

b5f7b53

Code cleanup

74da0c3

fix conflicts

cc869bb

Disable failed ut

01d0ef1

fix

d50e902

config

a7ec594

fix

3ee7165

Resolve comments

2b50355

fix conflict

9ffcbd7

fix

620ba3f

fix

4a9b829

code format

1c76b7d

fix

a45edd8

enable Runtime bloom filter join: two joins suite in spark 35

2923c37

Fix q64 performance

af9fc84

enable dynamic filter push down

1492bb3

fix join key rewrite in scala side

f979c20

tmp

34a05c2

fix

d4702ed

Capture the original join keys before converting the physical plan

b04c9ac

jinchengchenghh approved these changes Mar 9, 2026

View reviewed changes

zhztheplayer reviewed Mar 9, 2026

View reviewed changes

JkSelf mentioned this pull request Mar 9, 2026

[GLUTEN-7548][VL] Follow up hash join optimization PR 8931 to resolve comments #11728

Merged

Conversation

JkSelf commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

FelixYBW commented Mar 8, 2025

Uh oh!

JkSelf commented Mar 10, 2025

Uh oh!

github-actions bot commented Mar 10, 2025

Uh oh!

github-actions bot commented Mar 10, 2025

Uh oh!

github-actions bot commented Mar 10, 2025

Uh oh!

FelixYBW commented Mar 10, 2025

Uh oh!

JkSelf commented Mar 11, 2025

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 13, 2025

Uh oh!

github-actions bot commented Mar 18, 2025

Uh oh!

zhztheplayer commented Mar 18, 2025

Uh oh!

JkSelf commented Mar 19, 2025

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

JkSelf commented Mar 9, 2026

Uh oh!

zhztheplayer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhztheplayer Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

JkSelf commented Mar 7, 2025 •

edited

Loading