fix(shared-log): guard persistCoordinate against TOCTOU race during shutdown by Faolain · Pull Request #2 · Faolain/peerbit

Faolain · 2026-02-06T03:48:38Z

Problem

During test teardown (and potentially in production), SharedLog._close() can complete between the caller's \!this.closed check in findLeaders (line ~2677) and persistCoordinate's actual execution. At that point:

this._entryCoordinatesIndex has been set to undefined by _close() (lines ~1742-1744)
this._replicationRangeIndex has been set to undefined
Calling .put() on undefined throws a TypeError
Because this is in an async continuation without .catch(), it becomes an unhandled promise rejection

This is a TOCTOU (Time-of-Check-to-Time-of-Use) race condition. The \!this.closed guard in findLeaders passes, but _close() completes before persistCoordinate runs its body.

Call chain

onChange callback (from log.join)
  -> persistCoordinate(entry)
    -> findLeaders(coordinates, entry, { persist: {} })
      -> \!this.closed check passes
      -> ... async work ...
      -> persistCoordinate({ leaders, coordinates, replicas, entry })
        -> cidifyString(properties.entry.hash)  // may be partial
        -> this._entryCoordinatesIndex.put(...)  // undefined after _close()

Fix

Added multi-layer guards to persistCoordinate in packages/programs/data/shared-log/src/index.ts:

this.closed bail-out at method entry -- catches the TOCTOU race where _close() completed between the caller's check and this method's execution
this._entryCoordinatesIndex existence check -- uses the private backing field directly (not the getter which throws ClosedError) to silently bail if the index was torn down
this._replicationRangeIndex existence check -- same pattern for the replication range index
try/catch wrapper around the entire original body -- catches any error that occurs if _close() races mid-execution, and swallows it only if the instance has since been closed (re-throws otherwise to preserve real bug visibility)

Test Results

Methodology

Tested against 4 integration test files that exercise real Peerbit peer nodes with replication:

library.integration.test.ts -- replicates library items between 2 peers
playlist.integration.test.ts -- replicates playlist items, enforces ACLs
identity.integration.test.ts -- replicates profile updates across linked devices
replication.boundaries.test.ts -- tests remote search boundary enforcement

Each was run 3 times to assess flakiness, plus a full test suite run (182 files, 1572 tests).

Integration Tests (3 runs x 4 files)

Run	Result
Run 1	4/4 passed
Run 2	4/4 passed
Run 3	4/4 passed

Full Test Suite Comparison

Metric	Baseline (no fix)	With this fix
Test Files	7 failed / 171 passed	7 failed / 171 passed
Tests	19 failed / 1544 passed	19 failed / 1544 passed
Unhandled Errors	6 errors	2 errors

This fix eliminates 4 of 6 unhandled promise rejections. The remaining 2 errors originate from @peerbit/program's RPC layer (RPC.close -> TypedEventEmitter.dispatchEvent), which is a separate upstream issue unrelated to SharedLog.

The 7 failing test files are pre-existing failures unrelated to SharedLog:

chunkedAesGcmV1.test.ts (2 failures)
ProfilePage.test.tsx (2 failures)
WalletPanel.purchaseForm.test.tsx (flaky, 0-12 failures)
RecoverySetupPanel.test.tsx (1 failure)
generateSampleMp3.test.ts (1 failure)

Why This Approach

Three alternative approaches were tested in parallel worktrees:

Approach	Strategy	Unhandled Errors	Verdict
Root Cause A	Patch `rebalanceParticipation` + `__freqClosed` flag	7 (worse)	Ineffective -- debouncer holds closure ref
Root Cause B (this PR)	Guard `persistCoordinate` with TOCTOU protection	2	Most effective, minimal change
Root Cause C	All of B + guards on 4 other methods	2	Same as B -- extra guards add no benefit

Root Cause A failed because the @peerbit/time debouncer captures a direct closure reference to the original rebalanceParticipation function, bypassing prototype patches. Root Cause C showed that the additional guards on handleSubscriptionChange, removeReplicator, and rebalanceParticipation provided no incremental benefit over this targeted fix.

Relationship to PR dao-xyz#589

This fix is complementary to the pubsub subscribe() race fix in PR dao-xyz#589. PR dao-xyz#589 correctly fixes the race where DirectSub.subscribe() debounces _subscribe(), creating a window where incoming Subscribe messages are silently dropped. This fix addresses a separate but related issue: the shutdown teardown race in SharedLog that surfaces as unhandled promise rejections during test cleanup.

Generated with Claude Code

…hutdown During test teardown (and potentially production close), _close() can complete between the caller's !this.closed check in findLeaders and persistCoordinate's actual execution. At that point entryCoordinatesIndex is undefined, causing an unhandled promise rejection. This adds: - this.closed bail-out at method entry - entryCoordinatesIndex/replicationRangeIndex existence checks - try/catch wrapper that swallows errors if closed during execution Validated: reduces unhandled errors from 6 to 2 in integration tests. The remaining 2 errors originate from @peerbit/program RPC layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Faolain · 2026-02-06T03:50:57Z

Closing duplicate — superseded by #3 which targets the correct base branch (fix/pubsub-subscribe-race).

Faolain closed this Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(shared-log): guard persistCoordinate against TOCTOU race during shutdown#2

fix(shared-log): guard persistCoordinate against TOCTOU race during shutdown#2
Faolain wants to merge 1 commit intofix/pubsub-initialize-topic-on-subscribefrom
fix/rootcause-b-persistcoordinate-guard

Faolain commented Feb 6, 2026

Uh oh!

Faolain commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

Faolain commented Feb 6, 2026

Problem

Call chain

Fix

Test Results

Methodology

Integration Tests (3 runs x 4 files)

Full Test Suite Comparison

Why This Approach

Relationship to PR dao-xyz#589

Uh oh!

Faolain commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant