fix(shared-log): guard persistCoordinate against TOCTOU race during shutdown#3
Open
Faolain wants to merge 1 commit intofix/pubsub-subscribe-racefrom
Open
Conversation
…hutdown During test teardown (and potentially production close), _close() can complete between the caller's !this.closed check in findLeaders and persistCoordinate's actual execution. At that point entryCoordinatesIndex is undefined, causing an unhandled promise rejection. This adds: - this.closed bail-out at method entry - entryCoordinatesIndex/replicationRangeIndex existence checks - try/catch wrapper that swallows errors if closed during execution Validated: reduces unhandled errors from 6 to 2 in integration tests. The remaining 2 errors originate from @peerbit/program RPC layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Feb 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds targeted guards to
persistCoordinatein SharedLog to prevent unhandled rejections caused by a Time-of-Check-to-Time-of-Use (TOCTOU) race condition during node shutdown.The issue: when
_close()is called, it nullifies internal indices (_entryCoordinatesIndex,_replicationRangeIndex), but async operations already in the microtask/timer queue (triggered by replication events) still executepersistCoordinate. The existing!this.closedcheck infindLeaderspasses before_close()completes, but by the timepersistCoordinateruns, the indices are gone — causingTypeError: Cannot read properties of undefined.Root Cause Analysis
This was identified as Root Cause B in a systematic investigation of SharedLog teardown races. Three hypotheses were tested in parallel using isolated git worktrees:
rebalanceParticipation+__freqClosedflag@peerbit/timedebouncer holds a direct closure reference, bypassing prototype patchespersistCoordinatehandleSubscriptionChange,removeReplicator,rebalanceParticipation+ unhandledRejection safety netChanges
Four-layer guard added to
persistCoordinate(~line 3662 inindex.ts):this.closedearly bail — catches the common case where close has already completedentryCoordinatesIndexand_replicationRangeIndexhaven't been torn down by_close()_close()completes mid-execution; swallows the error if we're now closed (expected during shutdown), re-throws otherwiseTest Results
Methodology: Full vitest test suite (178 tests) run on each worktree. Integration tests (
library,playlist,identity,replication.boundaries) also run 3x separately to check for flakiness.Full Suite Results
Remaining 2 Errors
The remaining 2 unhandled errors originate from
@peerbit/program's RPC layer (RPC.close → TypedEventEmitter.dispatchEvent), not from SharedLog. These require a separate fix in the RPC/program teardown path.Pre-existing Failures (unchanged, present on all branches)
chunkedAesGcmV1.test.ts(2 failures)ProfilePage.test.tsx(2 failures)WalletPanel.purchaseForm.test.tsx(flaky, 0-12 failures)RecoverySetupPanel.test.tsx(1 failure)generateSampleMp3.test.ts(1 failure)Relationship to PR dao-xyz#589
This fix is complementary to the pubsub subscribe race fix in PR dao-xyz#589 (dao-xyz#589). PR dao-xyz#589 fixes the root cause of dropped subscription messages during topic initialization. This PR addresses a separate race condition exposed during teardown —
persistCoordinateexecuting after_close()has torn down internal indices. Both fixes are needed for a clean test suite.Why This Approach Over Comprehensive Guards (Root Cause C)
Root Cause C added guards to
handleSubscriptionChange,removeReplicator, andrebalanceParticipationas well. Testing showed zero incremental error reduction — all 4 eliminated errors came from thepersistCoordinateguard alone. The minimal approach (this PR) is preferred because: