Skip to content

fix(meta): renew operating region leases from keeper roles#7971

Open
WenyXu wants to merge 9 commits intoGreptimeTeam:mainfrom
WenyXu:fix/memory-keeper
Open

fix(meta): renew operating region leases from keeper roles#7971
WenyXu wants to merge 9 commits intoGreptimeTeam:mainfrom
WenyXu:fix/memory-keeper

Conversation

@WenyXu
Copy link
Copy Markdown
Member

@WenyXu WenyXu commented Apr 15, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR fixes a metasrv lease-renew bug for operating regions.
Previously, MemoryRegionKeeper only recorded (datanode_id, region_id), and operating-region lease renewal reused the region role reported by datanode heartbeats. This could break leader-only operations such as drop table, drop database, repartition, and region migration: if a leader region was temporarily reported as follower during an ongoing operation, metasrv could keep renewing the lease as follower, and the operation would get stuck in retries.
To fix this, this PR makes metasrv use an in-memory authoritative role for operating regions:

  • refactor MemoryRegionKeeper from a HashSet<(DatanodeId, RegionId)> to a HashMap<(DatanodeId, RegionId), RegionRole>
  • make operating-region registration explicitly write the intended role at all related call sites
  • renew operating-region leases from the role stored in MemoryRegionKeeper, instead of the transient role reported by heartbeat

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@WenyXu WenyXu requested a review from MichaelScofield as a code owner April 15, 2026 09:50
@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. labels Apr 15, 2026
@WenyXu WenyXu force-pushed the fix/memory-keeper branch from d5460c5 to e418686 Compare April 15, 2026 09:51
@WenyXu WenyXu requested a review from fengjiachun April 15, 2026 09:51
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions MemoryRegionKeeper from using a HashSet to a HashMap to track RegionRole for operating regions, updating DDL procedures and lease management logic accordingly. The review feedback suggests refactoring the codebase by removing redundant legacy functions and optimizing an iterator chain in MemoryRegionKeeper to avoid unnecessary allocations.

Comment thread src/common/meta/src/rpc/router.rs
Comment thread src/common/meta/src/region_keeper.rs Outdated
Comment thread src/common/meta/src/region_keeper.rs Outdated
@WenyXu
Copy link
Copy Markdown
Member Author

WenyXu commented Apr 15, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the MemoryRegionKeeper to store and manage RegionRole information alongside region IDs, transitioning its internal storage from a HashSet to a HashMap. This allows DDL procedures and the region lease keeper to track specific states like StagingLeader or DowngradingLeader during operations. The changes include renaming registration methods to register_with_role, adding helper functions to extract roles from routes, and updating lease renewal logic to prioritize the role stored in the keeper. Review feedback identifies opportunities to optimize an iterator in region_keeper.rs by removing an intermediate collection and to simplify a redundant check in router.rs.

Comment thread src/common/meta/src/region_keeper.rs Outdated
Comment thread src/common/meta/src/rpc/router.rs
@WenyXu WenyXu force-pushed the fix/memory-keeper branch from e418686 to 757947c Compare April 15, 2026 12:00
@WenyXu WenyXu mentioned this pull request Apr 15, 2026
5 tasks
@evenyag
Copy link
Copy Markdown
Contributor

evenyag commented Apr 15, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 757947c041

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/common/meta/src/region_keeper.rs Outdated
Comment thread src/meta-srv/src/region/lease_keeper.rs Outdated
@WenyXu
Copy link
Copy Markdown
Member Author

WenyXu commented Apr 16, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MemoryRegionKeeper and associated DDL and migration procedures to track and utilize RegionRole for operating regions. The implementation shifts from using a HashSet of region IDs to a HashMap that maps region IDs to specific roles, such as Leader, Follower, Staging, or Downgrading. While the refactoring improves role-based tracking, the review feedback identifies a regression in the extract_operating_region_roles function. Specifically, the new implementation no longer filters the input region set in-place, which prevents the RegionLeaseKeeper from optimizing metadata fetches by excluding already known operating regions.

Comment thread src/common/meta/src/region_keeper.rs
Comment thread src/meta-srv/src/region/lease_keeper.rs
Comment thread src/common/meta/src/region_keeper.rs
@WenyXu WenyXu force-pushed the fix/memory-keeper branch 3 times, most recently from 086eef7 to 81e8999 Compare April 16, 2026 07:14
WenyXu added 8 commits April 16, 2026 07:14
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: WenyXu <wenymedia@gmail.com>
@WenyXu WenyXu force-pushed the fix/memory-keeper branch 2 times, most recently from ab6879a to 63d057f Compare April 16, 2026 07:18
@WenyXu
Copy link
Copy Markdown
Member Author

WenyXu commented Apr 16, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the MemoryRegionKeeper to track RegionRole for operating regions, replacing the previous HashSet implementation with a HashMap. These changes are integrated into DDL procedures, region migration, and the RegionLeaseKeeper, which now uses the tracked roles to validate and renew region leases. Feedback suggests optimizing the extract_operating_region_roles function to avoid double lookups and refactoring operating_leader_region_roles to be more idiomatic and safer by avoiding explicit unwraps.

Comment thread src/common/meta/src/region_keeper.rs
Comment thread src/common/meta/src/rpc/router.rs Outdated
Signed-off-by: WenyXu <wenymedia@gmail.com>
@WenyXu WenyXu force-pushed the fix/memory-keeper branch from 63d057f to 343fe9b Compare April 16, 2026 07:26
Comment thread src/common/meta/src/region_keeper.rs
Copy link
Copy Markdown
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants