fix: process lifecycle management and memory leak prevention#12
fix: process lifecycle management and memory leak prevention#12jrenaldi79 merged 32 commits intomainfrom
Conversation
…nv var resolution Implements the IdleWatchdog class (BUSY/IDLE state machine) as the foundation for process lifecycle management. Supports configurable timeouts per mode, env var overrides, stuck-stream force-recovery, and clean cancellation. 24 tests covering all paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Task 5 added pid: process.pid to createSessionMetadata(). The existing test expected pid to be undefined; now it verifies the correct value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When MCP handler pre-writes pid to metadata, createSessionMetadata should preserve it instead of overwriting with process.pid. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wraps sendPrompt() with watchdog.markBusy()/markIdle() in try/finally. Watchdog is optional - existing callers unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Creates watchdog on server start, touch() on poll loop, cancel on cleanup. Hoisted watchdog declaration to function scope for catch block access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rvisor Single shared OpenCode server for MCP sessions with per-session watchdogs, LRU eviction at max capacity, and auto-restart on crash (3 attempts/5 min). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integrates SharedServerManager into mcp-server.js so that sidecar_start uses a single multiplexed OpenCode server (when SIDECAR_SHARED_SERVER != '0') instead of spawning an unref'd child process per request. Adds SIGTERM/SIGINT cleanup handlers to shut down the shared server on exit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntinue - resume.js: checkSessionLiveness() for dead-process detection, acquireLock/releaseLock - start.js: acquireLock on session start, releaseLock in finally - continue.js: lock previous session dir to prevent concurrent continues Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spawns real MCP server, fires 3 concurrent Gemini sessions, monitors RSS, verifies single shared server process, checks cleanup on exit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use sendPromptAsync with proper parts/model/agent format matching the headless path. Relax e2e process count assertion to log-only. Known issue: shared server path needs polling/finalization loop to match the per-process spawn path's behavior (status, progress, conversation tracking). Sessions start but never complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reuse runHeadless() from shared server MCP path by adding guard points for external client/server/watchdog. Fire-and-forget async pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix 5 issues: waitForServer skip, watchdog process.exit guard, third server.close location, session ownership, metadata ownership. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical fixes from external model reviews: - Add finalizeSession() call in .then() block - Add context building via buildContext() for parent conversation - Accept result param in .then() for summary access - Guard all 4 server.close() locations (including health check) - Document process.exit() poison pill risk Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks: guard points in runHeadless, MCP handler rewrite, test suite, e2e verification. All 5 server.close() locations guarded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d finalization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er support 4 guard points: skip server startup/shutdown when external client provided, watchdog passthrough, sessionId passthrough. All 5 server.close() locations guarded with !externalServer check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wraps eval runner with PID/RSS snapshots before, during (every 10s), and after eval execution. Detects orphaned processes and reports delta. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sidecar evals need ~7 min for full round-trip: Claude initialization, sidecar_start, LLM processing, polling, read, and applying fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
'gemini' was passed as raw alias to sendPromptAsync, which sent it as
{ providerID: 'openrouter', modelID: 'gemini' } - an invalid model.
Now resolves via tryResolveModel() to full model ID before sending.
Root cause of sessions stuck at 'messages: 0' in eval runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validates model aliases, prompts, timeouts, and agent/headless compatibility at MCP boundary before session creation. Returns structured error responses for LLM self-correction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New input-validators.js validates model (resolves aliases), prompt (non-empty), timeout (positive, max 60 min), and agent/headless compatibility before any session creation. Returns structured JSON errors with field, message, suggestions for LLM self-correction. Replaces silent failures with immediate, actionable error feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify error objects are JSON-serializable, include available aliases, and contain the invalid input in the message for LLM debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MCP Zod schema defaults agent to 'Chat'. The handler converts Chat to Build for headless mode. Validation must not reject this case since it's the normal default flow, not an explicit user error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds process lifecycle management: IdleWatchdog, session locking, input validation, and a SharedServerManager; integrates these into headless and MCP flows (fire-and-forget runHeadless for shared-server), updates sidecar start/resume/continue/crash handling, expands docs, and adds extensive tests and monitoring scripts. Changes
Sequence Diagram(s)sequenceDiagram
participant MCP as MCP Handler
participant Val as Input Validator
participant SVR as SharedServerManager
participant HLS as runHeadless
participant WDG as IdleWatchdog
participant Meta as Metadata Writer
participant Fin as finalizeSession
MCP->>Val: validateStartInputs(input)
Val-->>MCP: resolvedModel / error
alt validation fails
MCP-->>MCP: return validation error
else
MCP->>SVR: ensureServer()
SVR-->>MCP: {server, client}
MCP->>Meta: write initial session metadata
MCP->>HLS: runHeadless({client, server, sessionId, watchdog})
HLS->>WDG: start()
loop runtime / polling
HLS->>WDG: touch() / markBusy()/markIdle()
end
HLS->>Fin: on completion/error
Fin->>Meta: finalize metadata (status/timestamp)
MCP-->>MCP: return immediate response to caller
end
sequenceDiagram
participant App as Sidecar App
participant Lock as session-lock
participant Proc as OS Process
participant WDG as IdleWatchdog
App->>Lock: acquireLock(sessionDir, mode)
alt lock exists
Lock->>Proc: isPidAlive(pid) / check age
alt stale
Lock->>Lock: remove stale lock
Lock->>Lock: create new lock
else
Lock-->>App: error (session active)
end
else
Lock->>Lock: create lock atomically
end
App->>WDG: new IdleWatchdog(mode)
loop operations
App->>WDG: touch() / markBusy()/markIdle()
end
App->>Lock: releaseLock(sessionDir)
Lock->>Lock: delete lock file
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/sidecar/continue.js (1)
159-160:⚠️ Potential issue | 🟡 Minor
createHeartbeatcalled withoutsessionDirargument — missing progress tracking.The function signature is
createHeartbeat(interval = HEARTBEAT_INTERVAL, sessionDir). Whileintervalhas a default,sessionDirdoes not. CallingcreateHeartbeat()without arguments will work but skip progress tracking (the function conditionally reads progress only whensessionDiris provided). Instart.js(line 176), it's called with both arguments:createHeartbeat(HEARTBEAT_INTERVAL, sessDir). For consistency and to enable progress tracking here, importHEARTBEAT_INTERVALand passsessionDir:Proposed fix
+const { HEARTBEAT_INTERVAL } = require('./session-utils'); // ... in continueSidecar function - const heartbeat = createHeartbeat(); + const heartbeat = createHeartbeat(HEARTBEAT_INTERVAL, sessionDir);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sidecar/continue.js` around lines 159 - 160, The heartbeat is created without the required sessionDir so progress tracking is skipped; update the call to createHeartbeat to pass the interval and session dir (use HEARTBEAT_INTERVAL and the current sessionDir variable) and ensure HEARTBEAT_INTERVAL is imported at top of the file; specifically change the call from createHeartbeat() to createHeartbeat(HEARTBEAT_INTERVAL, sessionDir) and add the HEARTBEAT_INTERVAL import so createHeartbeat(sessionDir)-dependent progress reads run.src/headless.js (1)
257-456: 🛠️ Refactor suggestion | 🟠 MajorUse
session.status()instead of extending this custom completion poller.The new abort/watchdog logic adds more lifecycle behavior to the message loop, but this file's repo rule is to use the session status API for completion detection. Moving completion/stall checks there should make this path simpler and less divergence-prone. As per coding guidelines, "Use session.status() API for completion detection in headless mode instead of custom polling loops".
🧹 Nitpick comments (14)
tests/headless-watchdog.test.js (1)
7-13: Prefer behavior assertions over source-text string checks.This test can pass even if watchdog lifecycle wiring is broken at runtime. Consider asserting runtime interactions (watchdog creation/touch/cancel) via module-level mocks instead of
toContain(...)on file contents.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/headless-watchdog.test.js` around lines 7 - 13, Replace brittle source-text checks against src/headless.js with runtime behavior assertions: mock the watchdog module at the module level (use jest.mock for the module that exports the watchdog used by headless.js), require/import the headless module under test so it uses the mock, and then assert that the watchdog factory/creation was called and that the mock's touch() and cancel() methods were invoked as expected; target symbols to change are the test file tests/headless-watchdog.test.js and the runtime interactions watchdog.touch(), watchdog.cancel(), and the module that constructs the watchdog in headless.js so the test verifies calls rather than string presence.tests/interactive-watchdog.test.js (1)
7-12: This test is also implementation-coupled via string matching.Using runtime assertions for watchdog setup/cleanup would give stronger regression protection than checking source text fragments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/interactive-watchdog.test.js` around lines 7 - 12, Replace fragile source-text assertions in tests/interactive-watchdog.test.js with runtime checks against the module behavior: require or import the module exported by src/sidecar/interactive.js (e.g., the initializer like initInteractiveMode or exported watchdog instance), invoke the interactive startup function to create the watchdog, and assert the watchdog object exists and has the expected properties/methods (e.g., a running flag and a cancel method) and that calling the cleanup path calls watchdog.cancel or removes listeners; remove assertions that search for 'idle-watchdog', "mode: 'interactive'", or string 'watchdog.cancel()' and instead assert on the actual runtime effects (watchdog presence, cancel invoked, and any listener counts) using spies/mocks where needed.src/sidecar/crash-handler.js (1)
42-48: Reusesession-lockutility instead of hardcoding lock cleanup.This duplicates lock-file logic and filename ownership in two places.
♻️ Proposed refactor
const fs = require('fs'); -const path = require('path'); +const { releaseLock } = require('../utils/session-lock'); const { SessionPaths } = require('./session-utils'); @@ - // Delete session lock if it exists - const lockPath = path.join(sessionDir, 'session.lock'); - try { - fs.unlinkSync(lockPath); - } catch { - // Lock may not exist yet - } + // Delete session lock if it exists + releaseLock(sessionDir);As per coding guidelines: "Extract duplicate logic to shared utilities."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sidecar/crash-handler.js` around lines 42 - 48, Replace the hardcoded unlink logic that builds lockPath and swallows errors with the shared session-lock utility: import the utility used elsewhere (the session-lock module) and call its exported cleanup/release function (instead of manually creating lockPath from sessionDir and calling fs.unlinkSync in the try/catch) to remove the session lock; reference the existing variables sessionDir and the current unlink block so you replace that exact code with a call to the session-lock cleanup API (e.g., cleanupSessionLock or removeLock) to centralize lock-file logic.src/sidecar/interactive.js (1)
153-160: Watchdog integration is functional but could be cleaner.The initial
onTimeout(lines 157-159) is immediately overwritten (lines 191-196) onceelectronProcessis available. This works but is slightly awkward.Optional: Start watchdog after spawn for cleaner code
- // Start idle watchdog for interactive mode (60-min default timeout) - const { IdleWatchdog } = require('../utils/idle-watchdog'); - const watchdog = new IdleWatchdog({ - mode: 'interactive', - onTimeout: () => { - logger.info('Interactive idle timeout - shutting down', { taskId }); - }, - }).start(); - return new Promise((resolve, _reject) => { const electronPath = getElectronPath(); // ... spawn electronProcess ... + // Start idle watchdog after spawn so electronProcess is available + const { IdleWatchdog } = require('../utils/idle-watchdog'); + const watchdog = new IdleWatchdog({ + mode: 'interactive', + onTimeout: () => { + logger.info('Interactive idle timeout - shutting down', { taskId }); + if (!electronProcess.killed) { + electronProcess.kill('SIGTERM'); + } + }, + }).start(); - // Update watchdog onTimeout now that electronProcess is available - watchdog.onTimeout = () => { - logger.info('Interactive idle timeout - shutting down', { taskId }); - if (!electronProcess.killed) { - electronProcess.kill('SIGTERM'); - } - };Also applies to: 191-196
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sidecar/interactive.js` around lines 153 - 160, The IdleWatchdog is created with an onTimeout handler that is immediately replaced later when electronProcess exists; instead, instantiate/start the watchdog after spawning electron so you can set the final onTimeout once electronProcess is available. Locate the IdleWatchdog usage (symbols: IdleWatchdog, watchdog, start(), onTimeout, electronProcess) and move its creation/start to the block that runs after electronProcess is spawned (or construct it without onTimeout and call watchdog.setOnTimeout(...) once electronProcess is assigned) so the handler is only defined once and not overwritten.src/opencode-client.js (1)
1-574: File exceeds 300-line limit.This file is 574 lines, nearly double the 300-line limit specified in the coding guidelines. Consider splitting into separate modules (e.g.,
opencode-client.jsfor client operations,opencode-server.jsfor server management,mcp-config.jsfor MCP utilities).As per coding guidelines:
src/**/*.js: Any file must not exceed 300 lines.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/opencode-client.js` around lines 1 - 574, The file is too large (574 lines) and should be split; move client-related functions (createClient, createSession, createChildSession, sendPrompt, getMessages, getChildren, listSessions, getSessionStatus, abortSession, checkHealth, parseModelString) into a new opencode-client.js module, move server management functions (buildServerOptions, startServer, getCreateOpencodeServer, getCreateOpencodeClient, getSDK) into opencode-server.js, and move MCP utilities (loadMcpConfig, parseMcpSpec and any MCP-normalization logic) into mcp-config.js; update each module to export its symbols and change the original module.exports to re-export or require the new modules, and adjust any require/imports (e.g., references to buildProviderModels, buildCoworkAgentPrompt, logger) so they point to the new files to keep behavior unchanged.docs/superpowers/plans/2026-03-14-mcp-polling-handoff.md (1)
4-6: Consider removing developer-specific local path.Line 5 contains a local worktree path (
/Users/john_renaldi/claude-code-projects/sidecar-memory-leak) which is specific to one developer's machine. This may be confusing for other contributors. Consider removing or generalizing this path.✏️ Suggested change
**Date:** 2026-03-14 **Branch:** feature/memory-leak -**Worktree:** /Users/john_renaldi/claude-code-projects/sidecar-memory-leak🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-03-14-mcp-polling-handoff.md` around lines 4 - 6, The file contains a developer-specific local path on the "Worktree" line (/Users/john_renaldi/claude-code-projects/sidecar-memory-leak); remove or replace that path with a generic value (e.g., "./worktree" or omit the Worktree line entirely) so the docs don't expose a personal machine path. Edit the document entry that contains the "Worktree:" label to either remove the absolute path or substitute a neutral placeholder and commit the change.tests/process-lifecycle.test.js (1)
51-60: Source inspection test is brittle.This test reads crash-handler.js source and asserts string presence. If the implementation changes (e.g., uses
fs.promises.unlinkinstead ofunlinkSync, or renames the lock file), this test breaks even if behavior is correct.Consider replacing with an integration test that actually triggers crash handler logic and verifies the lock file is removed, or accept this as a lightweight smoke test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/process-lifecycle.test.js` around lines 51 - 60, The test inspects the crash-handler.js source for the literal strings 'session.lock' and 'unlinkSync', which is brittle; update the test to exercise the crash handler behavior instead: create a temporary session.lock file, invoke the crash handler entry point (the module in crash-handler.js or the exported function that performs cleanup), trigger the cleanup flow (or call the cleanup function directly), then assert the file no longer exists; alternatively, if keeping a source-level smoke test, relax it to check for either 'unlinkSync' or 'fs.promises.unlink' and for any reference to 'lock' rather than the exact 'session.lock'.src/sidecar/resume.js (1)
114-206: FunctionresumeSidecarexceeds 50-line limit.At ~93 lines, this function exceeds the 50-line guideline. Consider extracting the headless and interactive branches into separate helper functions (e.g.,
runHeadlessResume,runInteractiveResume) to improve readability and testability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sidecar/resume.js` around lines 114 - 206, The resumeSidecar function is too long; extract the headless branch and the interactive branch into two helpers (e.g., runHeadlessResume and runInteractiveResume) so resumeSidecar only orchestrates flow: create sessionDir/metadata, acquireLock, compute resumePrompt, start heartbeat, call the appropriate helper, then stop heartbeat, releaseLock, outputSummary, and finalizeSession. The helpers should accept the same context used in the current branches (at minimum: metadata.model, resumePrompt, taskId, project, timeout/effectiveAgent, existingConversation, and mcpServers) and return a result object with summary, timedOut, and error fields so existing post-call logging and summary handling in resumeSidecar remains unchanged; ensure updatedMetadata is still used for finalizeSession and keep heartbeat.stop() and releaseLock(sessionDir) inside the existing finally block.tests/idle-watchdog.test.js (1)
236-246: Replace the source-text assertion with a behavioral test.This only proves the file contains those tokens. Dead code still passes, and harmless renames fail. Stub the async prompt path and assert the watchdog transitions around it instead of grepping the source.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/idle-watchdog.test.js` around lines 236 - 246, Replace the brittle source-text grep in the existing test with a behavioral test that stubs the async prompt path and asserts watchdog transitions; require the module under test (opencode-client.js), spyOn or mock the internal async prompt function that sendPrompt wraps (the function sendPrompt calls), then call sendPrompt and assert watchdog.markBusy was called before the async promise resolves and watchdog.markIdle was called after it resolves (also add a test for the rejection path to ensure markIdle is still called on error). Use jest.fn()/jest.spyOn on the watchdog object and the internal prompt function to control timing and verify call order rather than checking file contents.tests/input-validation.test.js (1)
46-55: Make the default-model test deterministic.This branch accepts both success and failure based on whatever config happens to exist on the machine, so it won't catch regressions in default resolution. Isolate the config state for the test and assert one expected outcome.
src/utils/shared-server.js (1)
27-27: Environment variable parsing may not behave as intended for value "0".
Number(process.env.SIDECAR_MAX_SESSIONS) || 20will fall back to 20 if the env var is set to "0", sinceNumber("0")is falsy. While 0 max sessions is likely an invalid configuration, this could cause confusion.Consider using nullish coalescing if you want to respect explicit "0":
♻️ Suggested fix
- this.maxSessions = Number(process.env.SIDECAR_MAX_SESSIONS) || 20; + const envMax = process.env.SIDECAR_MAX_SESSIONS; + this.maxSessions = envMax !== undefined ? Number(envMax) : 20;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shared-server.js` at line 27, The assignment to this.maxSessions uses a truthy fallback that treats "0" as absent; update the parsing so an explicit "0" is respected by using nullish coalescing or an explicit check: parse process.env.SIDECAR_MAX_SESSIONS with Number(...) (or parseInt) into a temp value and set this.maxSessions = tempValue ?? 20 (or set to 20 only when env is undefined/null), referencing the existing this.maxSessions assignment in shared-server.js to locate the change.docs/superpowers/specs/2026-03-14-mcp-input-validation-design.md (1)
83-97: Spec differs from actual implementation for agent validation.The spec shows:
if (input.noUi) { const agentResult = validateHeadlessAgent(input.agent);But the actual implementation in
src/utils/input-validators.js(lines 107-122 per context snippet) has additional logic:if (input.noUi && input.agent) { const lower = input.agent.toLowerCase(); if (lower !== 'chat') { // Skip for chat - handler auto-converts const agentResult = validateHeadlessAgent(input.agent);The implementation correctly handles the MCP Zod schema default where
agentdefaults to 'Chat', and the handler auto-converts Chat→Build for headless mode. Consider updating the spec to document this edge case for future maintainers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-03-14-mcp-input-validation-design.md` around lines 83 - 97, Update the spec's agent validation section to match implementation: only call validateHeadlessAgent when noUi is true AND input.agent is present and not the default/auto-converted Chat; specifically document that the handler auto-converts "Chat"→"Build" for headless mode so the code uses the check if (input.noUi && input.agent) { const lower = input.agent.toLowerCase(); if (lower !== 'chat') { const agentResult = validateHeadlessAgent(input.agent); ... } } and therefore the spec should include this conditional and the rationale.docs/superpowers/plans/2026-03-14-mcp-polling-plan.md (1)
36-66: Source-check tests provide good safety net but are brittle.The test strategy of checking for string presence in source code (e.g.,
expect(src).toContain('externalServer')) is a lightweight way to verify implementation progress. However, these tests are fragile—renaming variables or refactoring would break them without affecting functionality.Consider supplementing with actual behavioral tests once the implementation is complete.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-03-14-mcp-polling-plan.md` around lines 36 - 66, The source-check tests in tests/headless-external-server.test.js are brittle because they assert string occurrences (e.g., 'externalServer', '!externalServer', 'options.sessionId', 'options.watchdog') in the raw src read from src/headless.js; replace or supplement these with behavioral/unit tests that import or require the headless module and assert runtime behavior instead (for example, call the exported function(s) with an options object containing externalServer/sessionId/watchdog and verify server.close() is skipped when externalServer is true and that sessionId/watchdog are accepted), keeping the existing source-checks only as a temporary guard if needed.docs/superpowers/specs/2026-03-14-mcp-polling-integration-design.md (1)
113-125: Minor inconsistency:buildContextcall usesawaithere but not in the plan file.This spec shows
context = await buildContext({...})while the plan file (2026-03-14-mcp-polling-plan.md, lines 270-278) shows synchronous usagecontext = buildContext(cwd, ...).Additionally, the parameter structure differs:
- Spec:
buildContext({ project: cwd, parentSession: ... })- Plan:
buildContext(cwd, input.parentSession, { ... })Verify the actual function signature and update both documents for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-03-14-mcp-polling-integration-design.md` around lines 113 - 125, The two docs are inconsistent about buildContext: confirm the real function signature and whether it returns a Promise (check the implementation of buildContext), then update both documents to match that signature and usage; if buildContext is async, change the plan file to use "await buildContext(...)" (or if synchronous, remove await in the spec), and standardize the parameter shape so both show the same call form (either buildContext({project: cwd, parentSession, coworkProcess, contextTurns, contextSince, contextMaxTokens}) or buildContext(cwd, input.parentSession, {coworkProcess, contextTurns, contextSince, contextMaxTokens})) so examples and parameter names (buildContext) are identical across files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/troubleshooting.md`:
- Line 7: The docs show the wrong session lock path; update the troubleshooting
entry to match SessionPaths.sessionDir() usage by replacing
~/.config/sidecar/sessions/<task_id>/session.lock with the project-local path
where sessions are stored:
<project>/.claude/sidecar_sessions/<task_id>/session.lock (i.e., instruct users
to delete <project>/.claude/sidecar_sessions/<task_id>/session.lock to clear a
stale lock).
In `@scripts/eval-with-monitoring.sh`:
- Around line 66-67: The script currently exits immediately on a non-zero exit
from the node command due to shell errexit (set -e), so EVAL_EXIT is never
captured and monitoring/cleanup never runs; change the block around the node
invocation to disable errexit just for that run (e.g., use set +e or run the
node command with a shell-safe capture pattern) so you can capture the node
process exit into the EVAL_EXIT variable (the symbol EVAL_EXIT) and then
re-enable errexit (set -e) before continuing to the existing monitoring/cleanup
logic; update the area around the node "$PROJECT_DIR/evals/run_eval.js"
invocation so failures do not short-circuit post-eval monitoring.
In `@src/headless.js`:
- Around line 161-173: The IdleWatchdog is started before session creation and
is not cancelled on the early-return error path in createSession(), leaving a
timer that can later call process.exit(0); update the createSession()
early-return handling (the block at lines ~181-191) to stop the watchdog when
aborting: reference the watchdog variable and call its stop/cancel method (e.g.,
watchdog.stop() or watchdog.cancel()) if it exists, or alternatively defer
starting the IdleWatchdog until after successful session creation (instantiate
IdleWatchdog only after createSession succeeds); ensure IdleWatchdog is
referenced by name and that any started instance is properly stopped on all
error/return paths.
In `@src/mcp-server.js`:
- Around line 127-134: The metadata currently records the long-lived MCP process
with "pid: process.pid", which causes sidecar_abort() to kill the whole server;
change the recorded task PID to the actual task/sidecar process ID instead. In
the JSON written to metaPath (the object created where taskId, status,
opencodeSessionId, opencodePort, goPid are set), replace pid: process.pid with
the task PID (e.g. pid: server.goPid || null or whatever child/sidecar PID
property is used by the server instance) and keep goPid as-is; ensure
sidecar_abort() reads that same pid field so it only signals the task process,
not the MCP server.
- Around line 121-134: You create a shared session (createSession -> sessionId)
and write metadata.json before calling addSession()/runHeadless, so if
addSession() or later fails you must clean up the partially-created shared
session and metadata instead of falling back to spawn. Wrap the
addSession()/runHeadless sequence in a try/catch; on error call the
corresponding session teardown on the client (e.g. deleteSession(sessionId) or
closeSession(sessionId) / client.removeSession(sessionId) — use the existing
session-management API in your codebase — and remove the metadata
file/sessionDir (fs.unlinkSync or fs.rmSync) before rethrowing the error so the
code does not continue to the spawn path and leaves stale metadata.json or a
leaked shared session. Ensure this same cleanup is applied to the other affected
block (lines ~156-208) where createSession is used.
In `@src/sidecar/resume.js`:
- Around line 137-140: The lock acquired by acquireLock(...) is taken before the
try/finally that releases it, so if buildMcpConfig, checkFileDrift,
updateSessionStatus, createHeartbeat or any other call between acquireLock and
the try throws the lock will never be released; move the call to
acquireLock(sessionDir, headless ? 'headless' : 'interactive') inside the try
block (or restructure so the try begins immediately after acquireLock) so the
finally block that releases the lock always runs, ensuring locks are released
even on intermediate failures.
In `@src/sidecar/session-utils.js`:
- Around line 248-250: checkSessionLiveness can throw when metadata is
null/undefined; update the function to first guard for a missing metadata object
(or missing pid/goPid) and return 'dead' immediately instead of calling
isProcessAlive on undefined. Specifically, in checkSessionLiveness add a
null/undefined check for metadata (and optionally for
metadata.pid/metadata.goPid) before invoking isProcessAlive so the function
safely returns 'dead' when metadata is absent.
In `@src/utils/input-validators.js`:
- Around line 55-67: The validation error currently interpolates input.model
into the message even when it's undefined; update the model-error branch in the
validation logic (the block checking modelError and using getEffectiveAliases(),
findSimilar(), and input.model) to only prepend the "Model 'X' not found." text
when a model string was actually supplied (e.g., typeof input.model === 'string'
&& input.model !== ''), otherwise use just the tryResolveModel() message;
likewise only compute suggestions/aliases when a model string exists (omit or
set suggestions to []/undefined when no model was provided) so the error message
no longer shows "Model 'undefined' not found."
In `@src/utils/session-lock.js`:
- Around line 29-39: isLockStale currently treats any lock as reclaimable after
a fixed age even if lockData.pid is alive, which lets a long-running legitimate
process have its lock stolen; change isLockStale to only consider a lock stale
when the PID is not alive (i.e., if !isPidAlive(lockData.pid) return true,
otherwise ignore the timestamp/age check), and do not use
MODE_TIMEOUTS/lockData.timestamp to reclaim live PIDs unless you implement a
heartbeat or owner token; additionally update releaseLock() to verify ownership
(e.g., require matching lockData.ownerToken or pid before deleting) so a new
owner cannot be removed by the original process on exit.
In `@tests/shared-server-e2e.integration.test.js`:
- Around line 109-112: The spawned MCP process inherits SIDECAR_SHARED_SERVER
from the parent env, causing the test to miss the shared-server path when
overridden; modify the spawn call (where NODE, SIDECAR_BIN and the 'mcp' arg are
used) to force-enable the flag by adding SIDECAR_SHARED_SERVER: '1' into the env
object (e.g., spread process.env then set SIDECAR_SHARED_SERVER: '1') so the
child always runs with shared-server enabled for this test.
- Around line 38-49: MemoryMonitor currently samples the wrong process
(client.child.pid) and misses the real OpenCode process; update MemoryMonitor so
its start/_record logic resolves and samples the actual Go child PID stored in
the session metadata (key "goPid") when present, falling back to
client.child.pid if not available, and ensure any other sampling locations
referenced around lines 235-236 use the same goPid-aware logic; modify
MemoryMonitor constructor/start/_record to accept or look up the session
metadata and use that PID for RSS measurements.
- Around line 318-333: Replace the weak assertions that only check non-empty
output with two strict checks: assert each pollResults[i].status is exactly
'complete' (not just in ['complete','error']) and assert the LLM output from the
sidecar_read call contains the deterministic marker `SESSION_${i}_OK`; locate
the loop using SESSION_COUNT that checks pollResults and the subsequent loop
calling client.request('tools/call', { name: 'sidecar_read', arguments: {
taskId: taskIds[i], project: tmpDir } }) and replace the expect calls so they
use expect(pollResults[i].status).toBe('complete') and
expect(readText).toContain(`SESSION_${i}_OK`) (use taskIds[i] where appropriate
for logging).
---
Outside diff comments:
In `@src/sidecar/continue.js`:
- Around line 159-160: The heartbeat is created without the required sessionDir
so progress tracking is skipped; update the call to createHeartbeat to pass the
interval and session dir (use HEARTBEAT_INTERVAL and the current sessionDir
variable) and ensure HEARTBEAT_INTERVAL is imported at top of the file;
specifically change the call from createHeartbeat() to
createHeartbeat(HEARTBEAT_INTERVAL, sessionDir) and add the HEARTBEAT_INTERVAL
import so createHeartbeat(sessionDir)-dependent progress reads run.
---
Nitpick comments:
In `@docs/superpowers/plans/2026-03-14-mcp-polling-handoff.md`:
- Around line 4-6: The file contains a developer-specific local path on the
"Worktree" line (/Users/john_renaldi/claude-code-projects/sidecar-memory-leak);
remove or replace that path with a generic value (e.g., "./worktree" or omit the
Worktree line entirely) so the docs don't expose a personal machine path. Edit
the document entry that contains the "Worktree:" label to either remove the
absolute path or substitute a neutral placeholder and commit the change.
In `@docs/superpowers/plans/2026-03-14-mcp-polling-plan.md`:
- Around line 36-66: The source-check tests in
tests/headless-external-server.test.js are brittle because they assert string
occurrences (e.g., 'externalServer', '!externalServer', 'options.sessionId',
'options.watchdog') in the raw src read from src/headless.js; replace or
supplement these with behavioral/unit tests that import or require the headless
module and assert runtime behavior instead (for example, call the exported
function(s) with an options object containing externalServer/sessionId/watchdog
and verify server.close() is skipped when externalServer is true and that
sessionId/watchdog are accepted), keeping the existing source-checks only as a
temporary guard if needed.
In `@docs/superpowers/specs/2026-03-14-mcp-input-validation-design.md`:
- Around line 83-97: Update the spec's agent validation section to match
implementation: only call validateHeadlessAgent when noUi is true AND
input.agent is present and not the default/auto-converted Chat; specifically
document that the handler auto-converts "Chat"→"Build" for headless mode so the
code uses the check if (input.noUi && input.agent) { const lower =
input.agent.toLowerCase(); if (lower !== 'chat') { const agentResult =
validateHeadlessAgent(input.agent); ... } } and therefore the spec should
include this conditional and the rationale.
In `@docs/superpowers/specs/2026-03-14-mcp-polling-integration-design.md`:
- Around line 113-125: The two docs are inconsistent about buildContext: confirm
the real function signature and whether it returns a Promise (check the
implementation of buildContext), then update both documents to match that
signature and usage; if buildContext is async, change the plan file to use
"await buildContext(...)" (or if synchronous, remove await in the spec), and
standardize the parameter shape so both show the same call form (either
buildContext({project: cwd, parentSession, coworkProcess, contextTurns,
contextSince, contextMaxTokens}) or buildContext(cwd, input.parentSession,
{coworkProcess, contextTurns, contextSince, contextMaxTokens})) so examples and
parameter names (buildContext) are identical across files.
In `@src/opencode-client.js`:
- Around line 1-574: The file is too large (574 lines) and should be split; move
client-related functions (createClient, createSession, createChildSession,
sendPrompt, getMessages, getChildren, listSessions, getSessionStatus,
abortSession, checkHealth, parseModelString) into a new opencode-client.js
module, move server management functions (buildServerOptions, startServer,
getCreateOpencodeServer, getCreateOpencodeClient, getSDK) into
opencode-server.js, and move MCP utilities (loadMcpConfig, parseMcpSpec and any
MCP-normalization logic) into mcp-config.js; update each module to export its
symbols and change the original module.exports to re-export or require the new
modules, and adjust any require/imports (e.g., references to
buildProviderModels, buildCoworkAgentPrompt, logger) so they point to the new
files to keep behavior unchanged.
In `@src/sidecar/crash-handler.js`:
- Around line 42-48: Replace the hardcoded unlink logic that builds lockPath and
swallows errors with the shared session-lock utility: import the utility used
elsewhere (the session-lock module) and call its exported cleanup/release
function (instead of manually creating lockPath from sessionDir and calling
fs.unlinkSync in the try/catch) to remove the session lock; reference the
existing variables sessionDir and the current unlink block so you replace that
exact code with a call to the session-lock cleanup API (e.g., cleanupSessionLock
or removeLock) to centralize lock-file logic.
In `@src/sidecar/interactive.js`:
- Around line 153-160: The IdleWatchdog is created with an onTimeout handler
that is immediately replaced later when electronProcess exists; instead,
instantiate/start the watchdog after spawning electron so you can set the final
onTimeout once electronProcess is available. Locate the IdleWatchdog usage
(symbols: IdleWatchdog, watchdog, start(), onTimeout, electronProcess) and move
its creation/start to the block that runs after electronProcess is spawned (or
construct it without onTimeout and call watchdog.setOnTimeout(...) once
electronProcess is assigned) so the handler is only defined once and not
overwritten.
In `@src/sidecar/resume.js`:
- Around line 114-206: The resumeSidecar function is too long; extract the
headless branch and the interactive branch into two helpers (e.g.,
runHeadlessResume and runInteractiveResume) so resumeSidecar only orchestrates
flow: create sessionDir/metadata, acquireLock, compute resumePrompt, start
heartbeat, call the appropriate helper, then stop heartbeat, releaseLock,
outputSummary, and finalizeSession. The helpers should accept the same context
used in the current branches (at minimum: metadata.model, resumePrompt, taskId,
project, timeout/effectiveAgent, existingConversation, and mcpServers) and
return a result object with summary, timedOut, and error fields so existing
post-call logging and summary handling in resumeSidecar remains unchanged;
ensure updatedMetadata is still used for finalizeSession and keep
heartbeat.stop() and releaseLock(sessionDir) inside the existing finally block.
In `@src/utils/shared-server.js`:
- Line 27: The assignment to this.maxSessions uses a truthy fallback that treats
"0" as absent; update the parsing so an explicit "0" is respected by using
nullish coalescing or an explicit check: parse process.env.SIDECAR_MAX_SESSIONS
with Number(...) (or parseInt) into a temp value and set this.maxSessions =
tempValue ?? 20 (or set to 20 only when env is undefined/null), referencing the
existing this.maxSessions assignment in shared-server.js to locate the change.
In `@tests/headless-watchdog.test.js`:
- Around line 7-13: Replace brittle source-text checks against src/headless.js
with runtime behavior assertions: mock the watchdog module at the module level
(use jest.mock for the module that exports the watchdog used by headless.js),
require/import the headless module under test so it uses the mock, and then
assert that the watchdog factory/creation was called and that the mock's touch()
and cancel() methods were invoked as expected; target symbols to change are the
test file tests/headless-watchdog.test.js and the runtime interactions
watchdog.touch(), watchdog.cancel(), and the module that constructs the watchdog
in headless.js so the test verifies calls rather than string presence.
In `@tests/idle-watchdog.test.js`:
- Around line 236-246: Replace the brittle source-text grep in the existing test
with a behavioral test that stubs the async prompt path and asserts watchdog
transitions; require the module under test (opencode-client.js), spyOn or mock
the internal async prompt function that sendPrompt wraps (the function
sendPrompt calls), then call sendPrompt and assert watchdog.markBusy was called
before the async promise resolves and watchdog.markIdle was called after it
resolves (also add a test for the rejection path to ensure markIdle is still
called on error). Use jest.fn()/jest.spyOn on the watchdog object and the
internal prompt function to control timing and verify call order rather than
checking file contents.
In `@tests/interactive-watchdog.test.js`:
- Around line 7-12: Replace fragile source-text assertions in
tests/interactive-watchdog.test.js with runtime checks against the module
behavior: require or import the module exported by src/sidecar/interactive.js
(e.g., the initializer like initInteractiveMode or exported watchdog instance),
invoke the interactive startup function to create the watchdog, and assert the
watchdog object exists and has the expected properties/methods (e.g., a running
flag and a cancel method) and that calling the cleanup path calls
watchdog.cancel or removes listeners; remove assertions that search for
'idle-watchdog', "mode: 'interactive'", or string 'watchdog.cancel()' and
instead assert on the actual runtime effects (watchdog presence, cancel invoked,
and any listener counts) using spies/mocks where needed.
In `@tests/process-lifecycle.test.js`:
- Around line 51-60: The test inspects the crash-handler.js source for the
literal strings 'session.lock' and 'unlinkSync', which is brittle; update the
test to exercise the crash handler behavior instead: create a temporary
session.lock file, invoke the crash handler entry point (the module in
crash-handler.js or the exported function that performs cleanup), trigger the
cleanup flow (or call the cleanup function directly), then assert the file no
longer exists; alternatively, if keeping a source-level smoke test, relax it to
check for either 'unlinkSync' or 'fs.promises.unlink' and for any reference to
'lock' rather than the exact 'session.lock'.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 515b2e2f-d89e-4a9c-b4cc-e6c06026b698
📒 Files selected for processing (37)
CLAUDE.mddocs/architecture.mddocs/configuration.mddocs/superpowers/plans/2026-03-14-mcp-polling-handoff.mddocs/superpowers/plans/2026-03-14-mcp-polling-plan.mddocs/superpowers/specs/2026-03-14-mcp-input-validation-design.mddocs/superpowers/specs/2026-03-14-mcp-polling-integration-design.mddocs/troubleshooting.mddocs/usage.mdevals/run_eval.jsscripts/eval-with-monitoring.shsrc/headless.jssrc/mcp-server.jssrc/opencode-client.jssrc/sidecar/continue.jssrc/sidecar/crash-handler.jssrc/sidecar/interactive.jssrc/sidecar/resume.jssrc/sidecar/session-utils.jssrc/sidecar/start.jssrc/utils/idle-watchdog.jssrc/utils/input-validators.jssrc/utils/session-lock.jssrc/utils/shared-server.jssrc/utils/validators.jstests/headless-external-server.test.jstests/headless-watchdog.test.jstests/idle-watchdog.test.jstests/input-validation.test.jstests/interactive-watchdog.test.jstests/mcp-shared-server.test.jstests/process-lifecycle.test.jstests/resume-lock.test.jstests/session-lock.test.jstests/shared-server-e2e.integration.test.jstests/shared-server.test.jstests/sidecar/start.test.js
src/mcp-server.js
Outdated
| const sessionId = await createSession(client); | ||
|
|
||
| // Write initial metadata (MCP handler owns this, runHeadless skips it) | ||
| fs.mkdirSync(sessionDir, { recursive: true, mode: 0o700 }); | ||
| const metaPath = path.join(sessionDir, 'metadata.json'); | ||
| const serverPort = server.url ? new URL(server.url).port : null; | ||
| fs.writeFileSync(metaPath, JSON.stringify({ | ||
| taskId, status: 'running', pid: process.pid, | ||
| opencodeSessionId: sessionId, | ||
| opencodePort: serverPort, | ||
| goPid: server.goPid || null, | ||
| createdAt: new Date().toISOString(), | ||
| headless: true, model: resolvedModel, | ||
| }, null, 2), { mode: 0o600 }); |
There was a problem hiding this comment.
Don't fall back after partially creating a shared session.
createSession() and metadata.json are created before addSession() / runHeadless() complete. If addSession() throws (for example when the shared server hits its max-session guard) or anything later in this block fails, Lines 206-208 drop into the spawn path without aborting the shared session or clearing that metadata. That leaks the shared-server session and leaves stale ownership data behind for the spawned fallback task.
Also applies to: 156-208
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/mcp-server.js` around lines 121 - 134, You create a shared session
(createSession -> sessionId) and write metadata.json before calling
addSession()/runHeadless, so if addSession() or later fails you must clean up
the partially-created shared session and metadata instead of falling back to
spawn. Wrap the addSession()/runHeadless sequence in a try/catch; on error call
the corresponding session teardown on the client (e.g. deleteSession(sessionId)
or closeSession(sessionId) / client.removeSession(sessionId) — use the existing
session-management API in your codebase — and remove the metadata
file/sessionDir (fs.unlinkSync or fs.rmSync) before rethrowing the error so the
code does not continue to the spawn path and leaves stale metadata.json or a
leaked shared session. Ensure this same cleanup is applied to the other affected
block (lines ~156-208) where createSession is used.
| function isLockStale(lockData) { | ||
| if (!isPidAlive(lockData.pid)) { | ||
| return true; | ||
| } | ||
| const modeTimeout = MODE_TIMEOUTS[lockData.mode] || MODE_TIMEOUTS.headless; | ||
| const ageMs = Date.now() - new Date(lockData.timestamp).getTime(); | ||
| if (ageMs > modeTimeout * 2) { | ||
| return true; | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
A live lock becomes reclaimable after a fixed age.
The timestamp is only written once, so any legitimate run that exceeds modeTimeout * 2 can be stolen even while its PID is alive. Because releaseLock() doesn't verify ownership, the original process can then delete the new owner's lock on exit. Only treat dead PIDs as stale unless you also add a heartbeat/owner token.
Also applies to: 86-93
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/session-lock.js` around lines 29 - 39, isLockStale currently treats
any lock as reclaimable after a fixed age even if lockData.pid is alive, which
lets a long-running legitimate process have its lock stolen; change isLockStale
to only consider a lock stale when the PID is not alive (i.e., if
!isPidAlive(lockData.pid) return true, otherwise ignore the timestamp/age
check), and do not use MODE_TIMEOUTS/lockData.timestamp to reclaim live PIDs
unless you implement a heartbeat or owner token; additionally update
releaseLock() to verify ownership (e.g., require matching lockData.ownerToken or
pid before deleting) so a new owner cannot be removed by the original process on
exit.
| class MemoryMonitor { | ||
| constructor(pid) { | ||
| this.pid = pid; | ||
| this.snapshots = []; | ||
| this._interval = null; | ||
| } | ||
|
|
||
| start(intervalMs = 2000) { | ||
| this._record(); // initial snapshot | ||
| this._interval = setInterval(() => this._record(), intervalMs); | ||
| return this; | ||
| } |
There was a problem hiding this comment.
This RSS monitor misses the actual shared OpenCode process.
MemoryMonitor samples client.child.pid, which is just the Node MCP wrapper. The shared-server path also starts the real OpenCode child (goPid is written into session metadata), so this can report a healthy peak while the actual server leaks.
Also applies to: 235-236
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/shared-server-e2e.integration.test.js` around lines 38 - 49,
MemoryMonitor currently samples the wrong process (client.child.pid) and misses
the real OpenCode process; update MemoryMonitor so its start/_record logic
resolves and samples the actual Go child PID stored in the session metadata (key
"goPid") when present, falling back to client.child.pid if not available, and
ensure any other sampling locations referenced around lines 235-236 use the same
goPid-aware logic; modify MemoryMonitor constructor/start/_record to accept or
look up the session metadata and use that PID for RSS measurements.
| // Verify all completed | ||
| for (let i = 0; i < SESSION_COUNT; i++) { | ||
| process.stderr.write(` [e2e] Session ${taskIds[i]} final: ${pollResults[i].status}\n`); | ||
| expect(['complete', 'error']).toContain(pollResults[i].status); | ||
| } | ||
|
|
||
| // Step 5: Read results to verify LLM actually responded | ||
| for (let i = 0; i < SESSION_COUNT; i++) { | ||
| const readResult = await client.request('tools/call', { | ||
| name: 'sidecar_read', | ||
| arguments: { taskId: taskIds[i], project: tmpDir }, | ||
| }); | ||
| const readText = readResult.result.content[0].text; | ||
| process.stderr.write(` [e2e] Session ${i} output length: ${readText.length} chars\n`); | ||
| expect(readText.length).toBeGreaterThan(0); | ||
| } |
There was a problem hiding this comment.
Assert success and expected content, not just "non-empty".
This currently passes when sessions finish in error, and sidecar_read() returning No summary available... still satisfies the length check. Since the prompt is deterministic, assert status === 'complete' and that each result contains SESSION_${i}_OK.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/shared-server-e2e.integration.test.js` around lines 318 - 333, Replace
the weak assertions that only check non-empty output with two strict checks:
assert each pollResults[i].status is exactly 'complete' (not just in
['complete','error']) and assert the LLM output from the sidecar_read call
contains the deterministic marker `SESSION_${i}_OK`; locate the loop using
SESSION_COUNT that checks pollResults and the subsequent loop calling
client.request('tools/call', { name: 'sidecar_read', arguments: { taskId:
taskIds[i], project: tmpDir } }) and replace the expect calls so they use
expect(pollResults[i].status).toBe('complete') and
expect(readText).toContain(`SESSION_${i}_OK`) (use taskIds[i] where appropriate
for logging).
…se in abort - Hoist sessionId declaration before the try block so catch can call sharedServer.removeSession(sessionId) to clean up any partially-created shared server session before falling through to the spawn path - Store pid: null (not process.pid) in metadata for shared server sessions; storing the MCP server's own PID caused sidecar_abort to SIGTERM the entire MCP process, killing all active sessions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docs/troubleshooting.md: correct session.lock path from ~/.config/sidecar/sessions/ to <project>/.claude/sidecar_sessions/ - scripts/eval-with-monitoring.sh: wrap eval node call with set +e/set -e so post-eval snapshot and process delta reporting always run even when the eval exits non-zero - tests/shared-server-e2e.integration.test.js: force SIDECAR_SHARED_SERVER=1 in MCP server spawn env; strengthen sidecar_read assertions (length > 10, regex match on session marker) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or message - headless.js: cancel watchdog on createSession error return path - resume.js: move outputSummary/finalizeSession inside try block, guard heartbeat.stop() for null - session-utils.js: guard checkSessionLiveness against null metadata - input-validators.js: show "No model specified" instead of "Model 'undefined'" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/sidecar/resume.js (1)
162-175:⚠️ Potential issue | 🟠 MajorForce
Buildwhen resuming in headless mode.A resumed session can carry forward
chatfrom metadata and pass it intorunHeadless(), which can stall headless execution. Normalize toBuildfor headless resumes.🔧 Proposed fix
- const effectiveAgent = metadata.agent || 'Build'; + const previousAgent = metadata.agent || 'Build'; + const effectiveAgent = + (headless && typeof previousAgent === 'string' && previousAgent.toLowerCase() === 'chat') + ? 'Build' + : previousAgent;As per coding guidelines: "For headless mode in interactive scenarios, use Build agent instead of chat agent to avoid stalling".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sidecar/resume.js` around lines 162 - 175, The resumed headless flow may carry a metadata.agent of "chat" causing runHeadless() to stall; update the agent selection so that when headless is true you force the Build agent. Change the effectiveAgent logic used before calling runHeadless (referencing effectiveAgent, metadata.agent, headless, runHeadless, and buildResumeUserMessage) so that effectiveAgent is set to 'Build' for headless resumes (otherwise keep metadata.agent || 'Build'), and pass that forced value into runHeadless.src/headless.js (1)
248-457: 🛠️ Refactor suggestion | 🟠 MajorMigrate to
session.status()API for completion detection instead of custom polling loop.The code still polls messages in a while loop with custom completion logic. Use the
getSessionStatus()wrapper (already exported fromopencode-client.js) which callssession.status()to check for thestatus: 'completed'field. This removes the need for the manual polling loop, message tracking, stable poll counting, and other heuristics currently in place.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/headless.js` around lines 248 - 457, The poll loop manually inspects messages via getMessages and complex heuristics (seenTextParts, stablePolls, lastAssistantMsgId, output growth, etc. in the while loop) — replace this entire custom polling logic with the existing getSessionStatus() wrapper from opencode-client.js (which calls session.status() and returns status:'completed') to detect completion; keep the external abort check and tool/message logging that you still need, but remove the stablePolls/seenTextParts/lastOutputLength/pollCount logic and the custom completion checks, and instead call getSessionStatus(sessionId) on a short interval, treat status==='completed' as completion, propagate sessionError if status indicates failure, and continue to use existing logMessage/writeProgress calls for received parts when still streaming messages. Ensure you update references to getMessages, lastAssistantMsgId, stablePolls, seenTextParts, and any related variables so they are removed or reconciled with getSessionStatus() usage.
♻️ Duplicate comments (1)
src/mcp-server.js (1)
122-136:⚠️ Potential issue | 🔴 CriticalClean up partially-created shared sessions before spawn fallback.
If failure happens after
createSession()/metadata write, current fallback can leave stale metadata and leaked shared-session state, then start a spawned task with inconsistent ownership metadata.🧹 Proposed cleanup before falling through to spawn
- let sessionId; + let sessionId; + let sharedClient; try { - const { server, client } = await sharedServer.ensureServer(); + const { server, client } = await sharedServer.ensureServer(); + sharedClient = client; const { createSession } = require('./opencode-client'); @@ } catch (err) { logger.warn('Shared server path failed, falling back to spawn', { error: err.message }); // Clean up partial shared server state before falling through if (sessionId) { sharedServer.removeSession(sessionId); + try { + const { abortSession } = require('./opencode-client'); + await abortSession(sharedClient, sessionId); + } catch (cleanupErr) { + logger.warn('Failed to abort leaked shared session', { sessionId, error: cleanupErr.message }); + } + try { + fs.rmSync(sessionDir, { recursive: true, force: true }); + } catch (cleanupErr) { + logger.warn('Failed to remove partial shared session directory', { + sessionDir, + error: cleanupErr.message, + }); + } } // Fall through to spawn path below }Also applies to: 208-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mcp-server.js` around lines 122 - 136, If an error occurs after createSession(client) and after writing metadata.json (metaPath) we must clean up the partially-created shared session to avoid stale metadata and leaked state: update the error/fallback path to delete the metadata file (metaPath) and remove the session directory (sessionDir) and/or call the session teardown helper (e.g. deleteSession(sessionId) or destroySession/sessionRegistry.deregister(sessionId) if one exists) before falling through to spawn; ensure cleanup is best-effort (catch and log errors but proceed with spawn) and apply the same cleanup logic to the other duplicate block around the 208-215 region so both shared-session failure paths remove metadata and deregister the sessionId.
🧹 Nitpick comments (3)
src/mcp-server.js (1)
74-248: Splitsidecar_start()into smaller helpers.This function now exceeds the repository limits for function length and try/catch density, which makes lifecycle/error paths harder to reason about and maintain.
As per coding guidelines: "Any function must not exceed 50 lines" and "Avoid more than 3 try/catch blocks in one function - split error handling".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mcp-server.js` around lines 74 - 248, The sidecar_start function is too long and has too many try/catch blocks; split it into small helpers to reduce length and error-handling density. Extract the shared-server path into a helper like runSharedHeadlessFlow(resolvedModel, input, cwd, taskId, sessionDir) that handles ensureServer(), createSession(), buildContext(), buildPrompts(), sharedServer.addSession/removeSession, runHeadless(). Extract the per-process fallback into spawnSidecarAndWriteMeta(args, sessionDir, taskId, input) which calls spawnSidecarProcess(), writes metadata.json, and returns the child/meta; move the metadata finalize/remove logic into finalizeSessionHandler(sessionDir, metaPath, result) (used by the runHeadless promise handlers). Keep sidecar_start as a thin coordinator that validates inputs (validateStartInputs), computes args/taskId/cwd, then calls either runSharedHeadlessFlow or spawnSidecarAndWriteMeta; ensure each new helper has at most one try/catch and reduce total try/catch in sidecar_start to <=3.src/utils/input-validators.js (1)
37-125: BreakvalidateStartInputs()into stage helpers.This new function is over the project limit and would be easier to maintain/test if split by stage (prompt/model/timeout/agent).
As per coding guidelines: "Any function must not exceed 50 lines".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/input-validators.js` around lines 37 - 125, validateStartInputs is too large—split its four validation stages into small helpers: create validatePromptStage(input) (use validatePromptContent), validateModelStage(input) (use tryResolveModel, getEffectiveAliases, findSimilar), validateTimeoutStage(input), and validateAgentStage(input) (use validateHeadlessAgent); have each helper return the same { valid, error, resolvedModel? } shape and make validateStartInputs simply call them in order, returning early on errors and finally returning { valid: true, resolvedModel } so resolved model from tryResolveModel is preserved; update imports/usages of validatePromptContent, tryResolveModel, getEffectiveAliases, findSimilar, and validateHeadlessAgent as needed and add unit tests for each new helper.src/sidecar/resume.js (1)
145-153: Use shared drift/conflict utilities instead of local drift checks.The resume drift path still relies on local
checkFileDrift()/buildDriftWarning()logic. Please switch this segment toconflict.js/drift.jsutilities so resume behavior stays consistent across sidecar flows.As per coding guidelines: "Use conflict.js for file conflict detection by comparing mtimes against session start time" and "Use drift.js calculateDrift() for detecting stale context based on age and turn count".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sidecar/resume.js` around lines 145 - 153, Replace the local checkFileDrift/buildDriftWarning flow with the shared utilities: import the file-conflict checker from conflict.js (e.g., checkConflicts or similar) and the calculateDrift function from drift.js, call checkConflicts(metadata, sessionStart) to detect mtime-based conflicts and calculateDrift(metadata, project) to compute stale-context info (hasChanges/changedFiles/lastActivityTime), then use the drift.js-provided warning builder (instead of buildDriftWarning) to compose the driftWarning and set resumePrompt = systemPrompt + '\n' + driftWarning; keep the same logger.warn call but include taskId and changedFileCount from the new conflict/drift results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/troubleshooting.md`:
- Line 7: Update the table cell that currently instructs to run rm on
session.lock to first recommend retrying/resuming once or waiting the staleness
window (e.g., a few minutes) because the lock cleanup logic already treats dead
PIDs/old locks as stale; if the retry/wait fails, then instruct manual deletion
of session.lock as a fallback. Locate the table row referencing "session.lock"
and replace the direct `rm` instruction with a short sequence: retry/wait
guidance, how long to wait, then manual deletion as last resort, keeping the
mention of "session.lock" so readers know which file to remove if needed.
- Line 5: Update the troubleshooting entry about the sidecar idle timeout to
mention the per-mode environment variables that override SIDECAR_IDLE_TIMEOUT:
document SIDECAR_IDLE_TIMEOUT_HEADLESS, SIDECAR_IDLE_TIMEOUT_INTERACTIVE, and
SIDECAR_IDLE_TIMEOUT_SERVER and note that any of these set to 0 will disable the
watchdog for that mode; also advise to check those per-mode vars in addition to
SIDECAR_IDLE_TIMEOUT and to use LOG_LEVEL=debug to trace watchdog transitions.
In `@scripts/eval-with-monitoring.sh`:
- Around line 32-33: The SUMMARY line written to MONITOR_LOG lacks the phase
label so later greps for SUMMARY.*PRE-EVAL/POST-EVAL fail and leave
PRE_OC/POST_OC/PRE_RSS/POST_RSS unset; update the echo that writes to
"$MONITOR_LOG" to include the same label (use "$label" like the console echo) so
the file contains "SUMMARY" plus the phase tag, and also make the variable
extraction robust (the code that sets PRE_OC, POST_OC, PRE_RSS, POST_RSS should
guard against grep failure by providing defaults or using "grep ... || true" or
parameter expansion defaults) to avoid exiting under set -euo pipefail.
- Line 60: The trap that kills the monitor process uses kill $MONITOR_PID which
can return non-zero and abort the script under set -e; modify the trap to make
the kill non-fatal by appending || true (so: kill $MONITOR_PID || true) and keep
the existing wait $MONITOR_PID || true; update the trap command that references
MONITOR_PID to include the || true after kill to ensure the post-eval analysis
runs even if the monitor already exited.
---
Outside diff comments:
In `@src/headless.js`:
- Around line 248-457: The poll loop manually inspects messages via getMessages
and complex heuristics (seenTextParts, stablePolls, lastAssistantMsgId, output
growth, etc. in the while loop) — replace this entire custom polling logic with
the existing getSessionStatus() wrapper from opencode-client.js (which calls
session.status() and returns status:'completed') to detect completion; keep the
external abort check and tool/message logging that you still need, but remove
the stablePolls/seenTextParts/lastOutputLength/pollCount logic and the custom
completion checks, and instead call getSessionStatus(sessionId) on a short
interval, treat status==='completed' as completion, propagate sessionError if
status indicates failure, and continue to use existing logMessage/writeProgress
calls for received parts when still streaming messages. Ensure you update
references to getMessages, lastAssistantMsgId, stablePolls, seenTextParts, and
any related variables so they are removed or reconciled with getSessionStatus()
usage.
In `@src/sidecar/resume.js`:
- Around line 162-175: The resumed headless flow may carry a metadata.agent of
"chat" causing runHeadless() to stall; update the agent selection so that when
headless is true you force the Build agent. Change the effectiveAgent logic used
before calling runHeadless (referencing effectiveAgent, metadata.agent,
headless, runHeadless, and buildResumeUserMessage) so that effectiveAgent is set
to 'Build' for headless resumes (otherwise keep metadata.agent || 'Build'), and
pass that forced value into runHeadless.
---
Duplicate comments:
In `@src/mcp-server.js`:
- Around line 122-136: If an error occurs after createSession(client) and after
writing metadata.json (metaPath) we must clean up the partially-created shared
session to avoid stale metadata and leaked state: update the error/fallback path
to delete the metadata file (metaPath) and remove the session directory
(sessionDir) and/or call the session teardown helper (e.g.
deleteSession(sessionId) or destroySession/sessionRegistry.deregister(sessionId)
if one exists) before falling through to spawn; ensure cleanup is best-effort
(catch and log errors but proceed with spawn) and apply the same cleanup logic
to the other duplicate block around the 208-215 region so both shared-session
failure paths remove metadata and deregister the sessionId.
---
Nitpick comments:
In `@src/mcp-server.js`:
- Around line 74-248: The sidecar_start function is too long and has too many
try/catch blocks; split it into small helpers to reduce length and
error-handling density. Extract the shared-server path into a helper like
runSharedHeadlessFlow(resolvedModel, input, cwd, taskId, sessionDir) that
handles ensureServer(), createSession(), buildContext(), buildPrompts(),
sharedServer.addSession/removeSession, runHeadless(). Extract the per-process
fallback into spawnSidecarAndWriteMeta(args, sessionDir, taskId, input) which
calls spawnSidecarProcess(), writes metadata.json, and returns the child/meta;
move the metadata finalize/remove logic into finalizeSessionHandler(sessionDir,
metaPath, result) (used by the runHeadless promise handlers). Keep sidecar_start
as a thin coordinator that validates inputs (validateStartInputs), computes
args/taskId/cwd, then calls either runSharedHeadlessFlow or
spawnSidecarAndWriteMeta; ensure each new helper has at most one try/catch and
reduce total try/catch in sidecar_start to <=3.
In `@src/sidecar/resume.js`:
- Around line 145-153: Replace the local checkFileDrift/buildDriftWarning flow
with the shared utilities: import the file-conflict checker from conflict.js
(e.g., checkConflicts or similar) and the calculateDrift function from drift.js,
call checkConflicts(metadata, sessionStart) to detect mtime-based conflicts and
calculateDrift(metadata, project) to compute stale-context info
(hasChanges/changedFiles/lastActivityTime), then use the drift.js-provided
warning builder (instead of buildDriftWarning) to compose the driftWarning and
set resumePrompt = systemPrompt + '\n' + driftWarning; keep the same logger.warn
call but include taskId and changedFileCount from the new conflict/drift
results.
In `@src/utils/input-validators.js`:
- Around line 37-125: validateStartInputs is too large—split its four validation
stages into small helpers: create validatePromptStage(input) (use
validatePromptContent), validateModelStage(input) (use tryResolveModel,
getEffectiveAliases, findSimilar), validateTimeoutStage(input), and
validateAgentStage(input) (use validateHeadlessAgent); have each helper return
the same { valid, error, resolvedModel? } shape and make validateStartInputs
simply call them in order, returning early on errors and finally returning {
valid: true, resolvedModel } so resolved model from tryResolveModel is
preserved; update imports/usages of validatePromptContent, tryResolveModel,
getEffectiveAliases, findSimilar, and validateHeadlessAgent as needed and add
unit tests for each new helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 96bbd82b-0200-46cc-a55e-2a4a305fbea4
📒 Files selected for processing (8)
docs/troubleshooting.mdscripts/eval-with-monitoring.shsrc/headless.jssrc/mcp-server.jssrc/sidecar/resume.jssrc/sidecar/session-utils.jssrc/utils/input-validators.jstests/shared-server-e2e.integration.test.js
🚧 Files skipped from review as they are similar to previous changes (2)
- src/sidecar/session-utils.js
- tests/shared-server-e2e.integration.test.js
| echo "SUMMARY: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG" | ||
| echo "[$label] opencode=$oc_count sidecar=$sc_count total_rss=${total_rss}MB" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify mismatch between written SUMMARY format and parser expectations.
sed -n '32p;87,90p' scripts/eval-with-monitoring.shRepository: jrenaldi79/sidecar
Length of output: 584
🏁 Script executed:
#!/bin/bash
# Get more context around the problematic lines
# Check script start for set -euo pipefail
head -5 scripts/eval-with-monitoring.sh
# Check context around line 32 (within a function)
sed -n '20,40p' scripts/eval-with-monitoring.sh
# Check context around lines 87-90
sed -n '80,95p' scripts/eval-with-monitoring.sh
# Check if there are other SUMMARY lines
grep -n "SUMMARY" scripts/eval-with-monitoring.shRepository: jrenaldi79/sidecar
Length of output: 3060
🏁 Script executed:
#!/bin/bash
# Check for set -euo pipefail and error handling
grep -n "set -[a-z]*" scripts/eval-with-monitoring.sh
# Get line count to understand script size
wc -l scripts/eval-with-monitoring.sh
# Check what happens after lines 87-90 - are there error handlers?
sed -n '87,105p' scripts/eval-with-monitoring.shRepository: jrenaldi79/sidecar
Length of output: 1016
SUMMARY format mismatch breaks leak detection—script will abort under set -euo pipefail.
Line 32 writes SUMMARY: without the phase label, but lines 87–90 search for SUMMARY.*PRE-EVAL / POST-EVAL. When grep finds no match, the pipe fails and variable assignment leaves PRE_OC, POST_OC, PRE_RSS, POST_RSS unset, causing the script to exit before leak reporting runs.
Suggested fix
- echo "SUMMARY: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG"
+ echo "SUMMARY [$label]: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG"
echo "[$label] opencode=$oc_count sidecar=$sc_count total_rss=${total_rss}MB"
@@
-PRE_OC=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2)
-POST_OC=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2)
-PRE_RSS=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2)
-POST_RSS=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2)
+PRE_OC=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || true)
+POST_OC=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || true)
+PRE_RSS=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || true)
+POST_RSS=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || true)
+PRE_OC=${PRE_OC:-0}
+POST_OC=${POST_OC:-0}
+PRE_RSS=${PRE_RSS:-0}
+POST_RSS=${POST_RSS:-0}Also applies to: 87-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/eval-with-monitoring.sh` around lines 32 - 33, The SUMMARY line
written to MONITOR_LOG lacks the phase label so later greps for
SUMMARY.*PRE-EVAL/POST-EVAL fail and leave PRE_OC/POST_OC/PRE_RSS/POST_RSS
unset; update the echo that writes to "$MONITOR_LOG" to include the same label
(use "$label" like the console echo) so the file contains "SUMMARY" plus the
phase tag, and also make the variable extraction robust (the code that sets
PRE_OC, POST_OC, PRE_RSS, POST_RSS should guard against grep failure by
providing defaults or using "grep ... || true" or parameter expansion defaults)
to avoid exiting under set -euo pipefail.
- troubleshooting.md: document per-mode timeout overrides, clarify lock auto-recovery behavior - eval-with-monitoring.sh: default empty grep results to 0, guard kill of already-exited monitor process Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/eval-with-monitoring.sh (1)
32-33:⚠️ Potential issue | 🔴 CriticalFix SUMMARY phase tagging; leak parsing is currently blind.
Line 32 writes
SUMMARY:without phase info, but Lines 87–90 and Line 114 grep SUMMARY lines forPRE-EVAL/POST-EVAL/DURING. This forces fallback zeros and can hide real leaks/peaks.Proposed fix
- echo "SUMMARY: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG" + echo "SUMMARY [$label]: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG" @@ -PRE_OC=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0") -POST_OC=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0") -PRE_RSS=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0") -POST_RSS=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0") +PRE_OC=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0") +POST_OC=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0") +PRE_RSS=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0") +POST_RSS=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0") @@ -grep "SUMMARY.*DURING" "$MONITOR_LOG" | sort -t= -k4 -n -r | head -3 +grep "SUMMARY \[DURING" "$MONITOR_LOG" | sort -t= -k4 -n -r | head -3Also applies to: 87-90, 114-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval-with-monitoring.sh` around lines 32 - 33, The SUMMARY log written to "$MONITOR_LOG" is missing the phase tag so downstream greps for PRE-EVAL/POST-EVAL/DURING miss entries; update the SUMMARY write that uses MONITOR_LOG to include the phase label (the same $label used in the console echo) so lines in the file contain the phase (e.g. include [$label] or prepend the phase token to the "SUMMARY:" entry), ensuring the parsers that grep for PRE-EVAL/POST-EVAL/DURING will match; adjust the SUMMARY echo that references oc_count, sc_count and total_rss to include $label.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/troubleshooting.md`:
- Line 5: Update the sentence to correctly state that per-mode environment
variables (SIDECAR_IDLE_TIMEOUT_HEADLESS, SIDECAR_IDLE_TIMEOUT_INTERACTIVE,
SIDECAR_IDLE_TIMEOUT_SERVER) take precedence over the blanket
SIDECAR_IDLE_TIMEOUT, and remove or reword the assertion that setting
LOG_LEVEL=debug specifically emits watchdog state-transition traces—instead say
that enabling debug increases log verbosity which may help when investigating
watchdog behavior. Keep the checklist format and retain the guidance to check
per-mode overrides first, mention the blanket SIDECAR_IDLE_TIMEOUT as a
fallback, and adjust the LOG_LEVEL claim to a general note about increased
verbosity rather than guaranteed state-transition traces.
---
Duplicate comments:
In `@scripts/eval-with-monitoring.sh`:
- Around line 32-33: The SUMMARY log written to "$MONITOR_LOG" is missing the
phase tag so downstream greps for PRE-EVAL/POST-EVAL/DURING miss entries; update
the SUMMARY write that uses MONITOR_LOG to include the phase label (the same
$label used in the console echo) so lines in the file contain the phase (e.g.
include [$label] or prepend the phase token to the "SUMMARY:" entry), ensuring
the parsers that grep for PRE-EVAL/POST-EVAL/DURING will match; adjust the
SUMMARY echo that references oc_count, sc_count and total_rss to include $label.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5f0b2ba1-fd48-4e38-9550-ce139f8953e0
📒 Files selected for processing (2)
docs/troubleshooting.mdscripts/eval-with-monitoring.sh
|
|
||
| | Issue | Cause | Solution | | ||
| |-------|-------|----------| | ||
| | Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER`. The blanket `SIDECAR_IDLE_TIMEOUT` overrides all modes. Set `LOG_LEVEL=debug` to trace watchdog transitions | |
There was a problem hiding this comment.
Fix timeout precedence wording and watchdog tracing claim (Line 5).
The precedence is reversed: per-mode vars take priority over SIDECAR_IDLE_TIMEOUT, not the other way around. Also, LOG_LEVEL=debug does not specifically emit watchdog state-transition traces as written.
Proposed doc correction
-| Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER`. The blanket `SIDECAR_IDLE_TIMEOUT` overrides all modes. Set `LOG_LEVEL=debug` to trace watchdog transitions |
+| Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER` (these take precedence). `SIDECAR_IDLE_TIMEOUT` is the fallback blanket setting. Set `LOG_LEVEL=debug` to increase diagnostics, and check idle-timeout/stuck-stream watchdog logs |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER`. The blanket `SIDECAR_IDLE_TIMEOUT` overrides all modes. Set `LOG_LEVEL=debug` to trace watchdog transitions | | |
| | Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER` (these take precedence). `SIDECAR_IDLE_TIMEOUT` is the fallback blanket setting. Set `LOG_LEVEL=debug` to increase diagnostics, and check idle-timeout/stuck-stream watchdog logs | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/troubleshooting.md` at line 5, Update the sentence to correctly state
that per-mode environment variables (SIDECAR_IDLE_TIMEOUT_HEADLESS,
SIDECAR_IDLE_TIMEOUT_INTERACTIVE, SIDECAR_IDLE_TIMEOUT_SERVER) take precedence
over the blanket SIDECAR_IDLE_TIMEOUT, and remove or reword the assertion that
setting LOG_LEVEL=debug specifically emits watchdog state-transition
traces—instead say that enabling debug increases log verbosity which may help
when investigating watchdog behavior. Keep the checklist format and retain the
guidance to check per-mode overrides first, mention the blanket
SIDECAR_IDLE_TIMEOUT as a fallback, and adjust the LOG_LEVEL claim to a general
note about increased verbosity rather than guaranteed state-transition traces.
Summary
runHeadless()for full lifecycle (polling, progress, finalization) via fire-and-forget async.validateStartInputs()validates model aliases, prompts, timeouts, and agent/headless compatibility at MCP boundary. Returns structured JSON errors with suggestions for LLM self-correction.Key metrics:
Test plan
SIDECAR_SHARED_SERVER=0falls back to per-process correctly🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests