-
Notifications
You must be signed in to change notification settings - Fork 9
Enforce subscription barrier for standby snapshot sync #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
0cea0f6
feat(tx_service): add ActiveTxMaxTs aggregation request and candidate…
MrGuin 7bb8418
docs(cursor_knowledge): split standby snapshot barrier docs into desi…
MrGuin 179176e
feat(snapshot_manager): add subscription barrier registry APIs
MrGuin efce676
refactor(snapshot_manager): extend pending snapshot task to carry bar…
MrGuin 44ee99b
feat(snapshot_sync): enforce subscription barrier when syncing standb…
MrGuin cf6f559
fix(snapshot_sync): complete standby cleanup paths
MrGuin 2ff6a97
refactor(standby): move barrier registration to reset and drop candid…
MrGuin 1c55066
chore(standby): downgrade missing forward-entry alert to warning
MrGuin e4a6597
fix(snapshot): drop superseded pending sync tasks and ignore stale ba…
MrGuin d913057
fix(standby): rollback local follow state when reset is rejected
MrGuin 646417e
refactor(snapshot): gate standby sync by current checkpoint ts
MrGuin 408f87d
docs(standby): align snapshot barrier docs with latest implementation
MrGuin 8dc5c4f
format
MrGuin 4d63bdf
update doc
MrGuin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
64 changes: 64 additions & 0 deletions
64
.cursor_knowledge/standby_snapshot_subscription_barrier_design.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| # Standby Snapshot Subscription Barrier: Design | ||
|
|
||
| ## 1. Background | ||
| Snapshot sync for standby was previously gated mostly by leader term and | ||
| subscribe-id coverage. That was not enough to ensure snapshot content is after | ||
| all transactions that were active when standby subscription became effective. | ||
|
|
||
| ## 2. Goal | ||
| Introduce a subscription barrier per standby epoch: | ||
| - `barrier_ts = max ActiveTxMaxTs across all local ccshards` | ||
| - A snapshot is valid for that epoch only when: | ||
| - `current_ckpt_ts > barrier_ts` | ||
|
|
||
| This guarantees the snapshot is after all transactions active at the | ||
| subscription success point. | ||
|
|
||
| ## 3. Key decisions | ||
| - Barrier sampling point is **`ResetStandbySequenceId` success path on primary** | ||
| (not `StandbyStartFollowing` and not `RequestStorageSnapshotSync`). | ||
| - Barrier key is `(standby_node_id, standby_term)`. | ||
| - `standby_term = (primary_term << 32) | subscribe_id`. | ||
| - Snapshot worker uses a lightweight checkpoint-ts probe before running heavy | ||
| checkpoint/flush. | ||
|
|
||
| ## 4. Runtime flow | ||
| 1. Standby calls `StandbyStartFollowing`, receives `subscribe_id` and start seq. | ||
| 2. Standby calls `ResetStandbySequenceId`. | ||
| 3. Primary marks standby as subscribed on all shards and samples | ||
| `global_active_tx_max_ts` using `ActiveTxMaxTsCc`. | ||
| 4. Primary registers barrier in `SnapshotManager`: | ||
| - `subscription_barrier_[node_id][standby_term] = barrier_ts` | ||
| 5. Standby calls `RequestStorageSnapshotSync` with `standby_term`. | ||
| 6. Primary validates barrier existence and enqueues one pending task per node | ||
| with attached `subscription_active_tx_max_ts`. | ||
| 7. `StandbySyncWorker` loop: | ||
| - Probe `current_ckpt_ts` via `GetCurrentCheckpointTs()` | ||
| - Select pending tasks that satisfy: | ||
| - same primary term | ||
| - `subscribe_id < current_subscribe_id` | ||
| - `current_ckpt_ts > barrier_ts` | ||
| - If no task is eligible, skip heavy checkpoint for this round. | ||
| - If at least one task is eligible, run `RunOneRoundCheckpoint`, build/send | ||
| snapshot, then notify standby. | ||
|
|
||
| ## 5. Pending and cleanup model | ||
| - Pending tasks blocked by subscribe-id or barrier remain queued for retry. | ||
| - Worker uses retry backoff wait when pending queue is non-empty but blocked, to | ||
| avoid tight loop. | ||
| - Superseded standby terms are pruned: | ||
| - registering newer barrier can drop older pending task for the same node | ||
| - older barriers are removed | ||
| - On leader loss, all pending tasks and barriers are cleared. | ||
| - On node removal, `EraseSubscriptionBarriersByNode` clears both pending and | ||
| barriers for that node. | ||
|
|
||
| ## 6. Safety properties | ||
| - Missing barrier on snapshot-sync request is rejected (safe default). | ||
| - Barrier is epoch-scoped and never shared across standby terms. | ||
| - All barrier/pending updates are under `standby_sync_mux_`. | ||
|
|
||
| ## 7. Expected effects | ||
| - Prevent early snapshots that miss writes from subscription-time active txns. | ||
| - Avoid unnecessary heavy checkpoint rounds when no task is currently eligible. | ||
| - Keep existing transport/retry semantics for snapshot send/notify. |
101 changes: 101 additions & 0 deletions
101
.cursor_knowledge/standby_snapshot_subscription_barrier_implementation.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,101 @@ | ||
| # Standby Snapshot Subscription Barrier: Implementation | ||
|
|
||
| ## 1. Scope of code changes | ||
| - `tx_service/src/remote/cc_node_service.cpp` | ||
| - `tx_service/src/fault/cc_node.cpp` | ||
| - `tx_service/include/cc/cc_request.h` (`ActiveTxMaxTsCc`) | ||
| - `tx_service/include/store/snapshot_manager.h` | ||
| - `tx_service/src/store/snapshot_manager.cpp` | ||
|
|
||
| ## 2. New/updated state | ||
|
|
||
| ### 2.1 Barrier registry in `SnapshotManager` | ||
| Add a map keyed by standby node and standby term: | ||
| - `subscription_barrier_[standby_node_id][standby_term] = barrier_ts` | ||
|
|
||
| ### 2.2 Pending snapshot task extension | ||
| Pending value is a task struct: | ||
| - `req` (`StorageSnapshotSyncRequest`) | ||
| - `subscription_active_tx_max_ts` | ||
|
|
||
| ## 3. New APIs in `SnapshotManager` | ||
| - `RegisterSubscriptionBarrier(standby_node_id, standby_term, barrier_ts)` | ||
| - `GetSubscriptionBarrier(standby_node_id, standby_term, uint64_t* out)` | ||
| - `EraseSubscriptionBarrier(standby_node_id, standby_term)` | ||
| - `EraseSubscriptionBarriersByNode(standby_node_id)` | ||
| - `GetCurrentCheckpointTs(node_group) -> uint64_t` | ||
| - `RunOneRoundCheckpoint(node_group, leader_term) -> bool` | ||
|
|
||
| Barrier/pending updates are protected by `standby_sync_mux_`. | ||
|
|
||
| ## 4. Barrier collection in `ResetStandbySequenceId` | ||
| In `CcNodeService::ResetStandbySequenceId` on primary: | ||
| 1. Move standby from candidate to subscribed on all shards. | ||
| 2. Validate leader term. | ||
| 3. If barrier for `(node_id, standby_term)` does not exist: | ||
| - run `ActiveTxMaxTsCc` across all shards | ||
| - compute global max | ||
| - call `SnapshotManager::RegisterSubscriptionBarrier(...)` | ||
|
|
||
| This makes the sampling point aligned with "subscription success". | ||
|
|
||
| ## 5. `RequestStorageSnapshotSync` path changes | ||
| In `SnapshotManager::OnSnapshotSyncRequested`: | ||
| 1. Parse `(standby_node_id, standby_term)` from request. | ||
| 2. Query barrier by `(standby_node_id, standby_term)`. | ||
| 3. If not found: reject request. | ||
| 4. If found: enqueue task with barrier ts. | ||
|
|
||
| Dedup is still term-based per standby node. | ||
|
|
||
| ## 6. Snapshot gating logic | ||
| `SnapshotManager::SyncWithStandby` now runs in two phases: | ||
| 1. Lightweight phase: | ||
| - `current_ckpt_ts = GetCurrentCheckpointTs(node_group)` | ||
| - Select tasks that satisfy: | ||
| - term alignment | ||
| - `subscribe_id < current_subscribe_id` | ||
| - `current_ckpt_ts > subscription_active_tx_max_ts` | ||
| - If no task is eligible, return directly. | ||
| 2. Heavy phase: | ||
| - Run `RunOneRoundCheckpoint(...)` (flush) | ||
| - Create/send snapshot and notify standby for selected tasks. | ||
|
|
||
| ## 7. Worker retry pacing | ||
| `StandbySyncWorker` keeps existing wake condition on non-empty pending queue, and | ||
| adds short wait-for backoff (`200ms`) when requests remain pending after a | ||
| round, to avoid tight retry loops. | ||
|
|
||
| ## 8. Cleanup rules | ||
| - On successful snapshot completion for `(node_id, standby_term)`: erase barrier entry. | ||
| - On registering newer term for same node: prune older barriers and drop older | ||
| pending task. | ||
| - On node removal: `EraseSubscriptionBarriersByNode(node_id)` clears both | ||
| pending and barrier entries. | ||
| - On leader loss in sync loop: clear all pending and barriers. | ||
|
|
||
| ## 9. Failure behavior | ||
| - Missing barrier on sync request: reject request (safe default). | ||
| - Checkpoint failure: keep task queued for next rounds. | ||
| - Snapshot transfer failure: task stays pending and retries in later rounds. | ||
|
|
||
| ## 10. Standby-side rejection handling | ||
| In `CcNode::SubscribePrimaryNode`, if `ResetStandbySequenceId` is rejected by | ||
| primary, local standby following state is rolled back: | ||
| - unsubscribe per-shard standby sequence groups | ||
| - reset standby/candidate standby term if still on the failed term | ||
| - guard against clobbering newer resubscribe attempts. | ||
|
|
||
| ## 11. Tests | ||
|
|
||
| ### Unit tests | ||
| - barrier register/get/erase and supersession behavior | ||
| - pending dedup with barrier replacement | ||
| - gating boundaries (`current_ckpt_ts == / < / > barrier_ts`) | ||
|
|
||
| ### Integration tests | ||
| - long-running active tx at subscription success blocks snapshot until | ||
| `current_ckpt_ts > barrier` | ||
| - multiple standbys with independent barriers | ||
| - repeated sync-request retries with same standby term | ||
| - leader term switch cleanup correctness |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wlock_ts_is not a safe snapshot barrier.ActiveTxMaxTs()records the max first-write-lock timestamp, butSyncWithStandby()later treatsckpt_ts > barrier_tsas proof that all txns active at subscribe time are represented in the snapshot. That does not hold: an older txn can still commit after a newer txn has already forcedckpt_tsdown tonewer_tx.wlock_ts - 1, so the snapshot can pass the barrier and still miss the older txn's commit. The barrier needs a completion watermark, not the max lock-acquisition timestamp.