-
Notifications
You must be signed in to change notification settings - Fork 19
Support put KV: Add HashBucketAssigner #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Kelvinyu1117 @luoyuxia , would appreciate review here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds HashBucketAssigner as part of the larger Rust client put KV feature. The implementation includes a custom MurmurHash port from Apache Flink to avoid external dependencies, a new DataLakeFormat enum for format selection, and bucketing functions that support Fluss, Paimon, Lance, and Iceberg formats.
Key Changes:
- Added
HashBucketAssignerto route records to buckets based on hash-based bucketing strategies - Ported MurmurHash3 implementations (standard and Flink variants) for consistent hashing with Java implementations
- Introduced
DataLakeFormatenum andBucketingFunctiontrait with format-specific implementations
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
crates/fluss/src/util/murmur_hash.rs |
Implements MurmurHash3 algorithms (standard and Flink variants) with comprehensive test coverage |
crates/fluss/src/util/mod.rs |
Exposes the murmur_hash module |
crates/fluss/src/metadata/data_lake_format.rs |
Defines DataLakeFormat enum for supported data lake formats |
crates/fluss/src/metadata/mod.rs |
Adds and exports the data_lake_format module |
crates/fluss/src/lib.rs |
Adds the bucketing module to the library |
crates/fluss/src/client/write/bucket_assigner.rs |
Implements HashBucketAssigner for hash-based bucket assignment |
crates/fluss/src/bucketing/mod.rs |
Defines BucketingFunction trait and format-specific implementations with tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leekeiabstraction Thanks for the pr. Left some comments. PTAL
…ner related classes
leekeiabstraction
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leekeiabstraction Thanks for update.. Only one comment.
crates/fluss/src/util/murmur_hash.rs
Outdated
|
|
||
| // Finalization mix - force all bits of a hash block to avalanche | ||
| #[inline(always)] | ||
| fn fmix(mut h1: i32, length: usize) -> i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope it won’t upset you.....
The tricky part is that standard murmur_hash use usize...
But the Fluss uses int32, although it's not standard, we have to follow it.
https://github.com/apache/fluss/blob/f7062eae19be72e5c8dd70373d43cf1f8d283d07/fluss-common/src/main/java/org/apache/fluss/utils/MurmurHashUtils.java#L147
So, iceberg use standard murmur_hash, fluss doesn't.. we will need to keep two..
Maybe make standard murmur_hash a mod, but fluss murmur_hash another mod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries!
Hmm, I'm not sure if I fully understand what you mean by another mod? Also, I can add an assert to check for size smaller than i32::Max for fluss/flink variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me. I'm also fine to add an assert to check for size smaller than i32::Max for fluss/flink variant.
At least it can avoid the difference of hash between rust client and java client
|
Also added test on Java side: apache/fluss#2274 |
Purpose
Linked issue: close #112
Add HashBucketAssigner as part of larger rust client put KV feature
Brief change log
Tests