fix(reliability): Address 12 review issues across circuit breaker, DLQ, config merge, and CLI#3
Draft
Copilot wants to merge 2 commits intoclaude/improve-reliability-four-nines-K38rWfrom
Draft
Conversation
… review - Add summary table to Nine 3 (matching Nine 1/2/2.5 format) - Add detailed explanations for all Nine 1/2/2.5 features - Extract all human review gates to dedicated HUMAN_REVIEW.md - Restructure roadmap into table + explanations https://claude.ai/code/session_012PTx7bZNNh74rAshCcuSQg
Copilot
AI
changed the title
[WIP] Add four-nines reliability framework with circuit breakers, DLQ, and monitoring
fix(reliability): Address 12 review issues across circuit breaker, DLQ, config merge, and CLI
Mar 11, 2026
8ee54c1 to
abb0fc8
Compare
febc345 to
7601b84
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses all items flagged in the PR review of the four-nines reliability module.
Circuit Breaker
half_openstate now enforces a single in-flight probe viaprobeInFlight: booleanonCircuitState— subsequentcanExecute()calls returnfalseuntil the probe resolves or failsDLQ / Recovery
SystemMessageWriter.write()now spreadsextraFrontmatterinto the SQLite queuepayload, sosession-idandresume-meshare readable atnextMsg.payloadwhere the dispatcher expects themdeadLetter()now accepts and forwardsretryCount/maxRetriesfrommachine.currentContextso retry-exhausted entries are correctly classified asmanual(not auto-recoverable)recoverAll()is now gated behindWORKER_AUTO_RECOVER_DLQ=trueenv var; previously ran unconditionally on every dispatcher startConfig & Safe Mode
mergeSectionhelper — partial nested overrides (e.g. onlycircuitBreaker.cooldownMs) no longer silently drop sibling keysevaluateSLI()now uses per-mesh rate (snapshot.byMesh[meshName]?.rate) instead of the global rate to prevent a noisy mesh from escalating safe mode in unrelated mesheslastChange?.reason?.startsWith('auto:') ?? falseinSafeMode.getState()— was crashing when no level change had occurred yetCLI / Docs
tx mesh healthnow shows a warning whentotalEvents === 0rather than reporting a misleading 100% success rate; persisted circuit breaker and DLQ data still renderedparseFlags()uses an indexed loop for--rewind-toso the value arg is consumed correctly and duplicate flags can't cause mis-readsbucketMsfromSLIConfig💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.