fix(shared-log): guard persistCoordinate against TOCTOU race during shutdown#2
Closed
Faolain wants to merge 1 commit intofix/pubsub-initialize-topic-on-subscribefrom
Closed
Conversation
…hutdown During test teardown (and potentially production close), _close() can complete between the caller's !this.closed check in findLeaders and persistCoordinate's actual execution. At that point entryCoordinatesIndex is undefined, causing an unhandled promise rejection. This adds: - this.closed bail-out at method entry - entryCoordinatesIndex/replicationRangeIndex existence checks - try/catch wrapper that swallows errors if closed during execution Validated: reduces unhandled errors from 6 to 2 in integration tests. The remaining 2 errors originate from @peerbit/program RPC layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner
Author
|
Closing duplicate — superseded by #3 which targets the correct base branch (fix/pubsub-subscribe-race). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During test teardown (and potentially in production),
SharedLog._close()can complete between the caller's\!this.closedcheck infindLeaders(line ~2677) andpersistCoordinate's actual execution. At that point:this._entryCoordinatesIndexhas been set toundefinedby_close()(lines ~1742-1744)this._replicationRangeIndexhas been set toundefined.put()onundefinedthrows aTypeError.catch(), it becomes an unhandled promise rejectionThis is a TOCTOU (Time-of-Check-to-Time-of-Use) race condition. The
\!this.closedguard infindLeaderspasses, but_close()completes beforepersistCoordinateruns its body.Call chain
Fix
Added multi-layer guards to
persistCoordinateinpackages/programs/data/shared-log/src/index.ts:this.closedbail-out at method entry -- catches the TOCTOU race where_close()completed between the caller's check and this method's executionthis._entryCoordinatesIndexexistence check -- uses the private backing field directly (not the getter which throwsClosedError) to silently bail if the index was torn downthis._replicationRangeIndexexistence check -- same pattern for the replication range index_close()races mid-execution, and swallows it only if the instance has since been closed (re-throws otherwise to preserve real bug visibility)Test Results
Methodology
Tested against 4 integration test files that exercise real Peerbit peer nodes with replication:
library.integration.test.ts-- replicates library items between 2 peersplaylist.integration.test.ts-- replicates playlist items, enforces ACLsidentity.integration.test.ts-- replicates profile updates across linked devicesreplication.boundaries.test.ts-- tests remote search boundary enforcementEach was run 3 times to assess flakiness, plus a full test suite run (182 files, 1572 tests).
Integration Tests (3 runs x 4 files)
Full Test Suite Comparison
This fix eliminates 4 of 6 unhandled promise rejections. The remaining 2 errors originate from
@peerbit/program's RPC layer (RPC.close -> TypedEventEmitter.dispatchEvent), which is a separate upstream issue unrelated to SharedLog.The 7 failing test files are pre-existing failures unrelated to SharedLog:
chunkedAesGcmV1.test.ts(2 failures)ProfilePage.test.tsx(2 failures)WalletPanel.purchaseForm.test.tsx(flaky, 0-12 failures)RecoverySetupPanel.test.tsx(1 failure)generateSampleMp3.test.ts(1 failure)Why This Approach
Three alternative approaches were tested in parallel worktrees:
rebalanceParticipation+__freqClosedflagpersistCoordinatewith TOCTOU protectionRoot Cause A failed because the
@peerbit/timedebouncer captures a direct closure reference to the originalrebalanceParticipationfunction, bypassing prototype patches. Root Cause C showed that the additional guards onhandleSubscriptionChange,removeReplicator, andrebalanceParticipationprovided no incremental benefit over this targeted fix.Relationship to PR dao-xyz#589
This fix is complementary to the pubsub
subscribe()race fix in PR dao-xyz#589. PR dao-xyz#589 correctly fixes the race whereDirectSub.subscribe()debounces_subscribe(), creating a window where incomingSubscribemessages are silently dropped. This fix addresses a separate but related issue: the shutdown teardown race in SharedLog that surfaces as unhandled promise rejections during test cleanup.Generated with Claude Code