fix(meta): renew operating region leases from keeper roles#7971
fix(meta): renew operating region leases from keeper roles#7971WenyXu wants to merge 9 commits intoGreptimeTeam:mainfrom
Conversation
d5460c5 to
e418686
Compare
There was a problem hiding this comment.
Code Review
This pull request transitions MemoryRegionKeeper from using a HashSet to a HashMap to track RegionRole for operating regions, updating DDL procedures and lease management logic accordingly. The review feedback suggests refactoring the codebase by removing redundant legacy functions and optimizing an iterator chain in MemoryRegionKeeper to avoid unnecessary allocations.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the MemoryRegionKeeper to store and manage RegionRole information alongside region IDs, transitioning its internal storage from a HashSet to a HashMap. This allows DDL procedures and the region lease keeper to track specific states like StagingLeader or DowngradingLeader during operations. The changes include renaming registration methods to register_with_role, adding helper functions to extract roles from routes, and updating lease renewal logic to prioritize the role stored in the keeper. Review feedback identifies opportunities to optimize an iterator in region_keeper.rs by removing an intermediate collection and to simplify a redundant check in router.rs.
e418686 to
757947c
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 757947c041
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the MemoryRegionKeeper and associated DDL and migration procedures to track and utilize RegionRole for operating regions. The implementation shifts from using a HashSet of region IDs to a HashMap that maps region IDs to specific roles, such as Leader, Follower, Staging, or Downgrading. While the refactoring improves role-based tracking, the review feedback identifies a regression in the extract_operating_region_roles function. Specifically, the new implementation no longer filters the input region set in-place, which prevents the RegionLeaseKeeper from optimizing metadata fetches by excluding already known operating regions.
086eef7 to
81e8999
Compare
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
ab6879a to
63d057f
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the MemoryRegionKeeper to track RegionRole for operating regions, replacing the previous HashSet implementation with a HashMap. These changes are integrated into DDL procedures, region migration, and the RegionLeaseKeeper, which now uses the tracked roles to validate and renew region leases. Feedback suggests optimizing the extract_operating_region_roles function to avoid double lookups and refactoring operating_leader_region_roles to be more idiomatic and safer by avoiding explicit unwraps.
Signed-off-by: WenyXu <wenymedia@gmail.com>
63d057f to
343fe9b
Compare
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR fixes a metasrv lease-renew bug for operating regions.
Previously,
MemoryRegionKeeperonly recorded(datanode_id, region_id), and operating-region lease renewal reused the region role reported by datanode heartbeats. This could break leader-only operations such as drop table, drop database, repartition, and region migration: if a leader region was temporarily reported as follower during an ongoing operation, metasrv could keep renewing the lease as follower, and the operation would get stuck in retries.To fix this, this PR makes metasrv use an in-memory authoritative role for operating regions:
MemoryRegionKeeperfrom aHashSet<(DatanodeId, RegionId)>to aHashMap<(DatanodeId, RegionId), RegionRole>MemoryRegionKeeper, instead of the transient role reported by heartbeatPR Checklist
Please convert it to a draft if some of the following conditions are not met.