fix: process lifecycle management and memory leak prevention by jrenaldi79 · Pull Request #12 · jrenaldi79/sidecar

jrenaldi79 · 2026-03-14T18:15:25Z

Summary

IdleWatchdog: Self-terminating timer for all sidecar spawn paths (15m headless, 60m interactive, 30m server). BUSY/IDLE state machine with stuck stream protection.
SharedServerManager: Single multiplexed OpenCode server for MCP sessions instead of N per-session servers. Includes supervisor with auto-restart (3 attempts/5 min), LRU session eviction.
Session locks: Atomic lock files prevent concurrent resume/continue. Dead-process detection restores sessions from disk.
MCP polling integration: Shared server path delegates to runHeadless() for full lifecycle (polling, progress, finalization) via fire-and-forget async.
Input validation: validateStartInputs() validates model aliases, prompts, timeouts, and agent/headless compatibility at MCP boundary. Returns structured JSON errors with suggestions for LLM self-correction.
E2E tests: Round-robin concurrent Gemini sessions with RSS memory monitoring. Process/memory tracking script for eval runs.

Key metrics:

All 4 agentic evals pass (avg score 0.97)
85 test suites, 1627 tests green
New files all under 300 lines
Peak RSS: 86MB for 3 concurrent sessions

Test plan

76 unit test suites pass (1548 tests)
9 integration test suites pass (79 tests)
Shared server e2e: 3 concurrent Gemini sessions complete, memory bounded
All 4 agentic evals pass (Debug Auth Bug, Generate Tests, Research, Context Passing)
Process monitoring shows no orphaned processes after cleanup
Input validation rejects invalid models with suggestions
SIDECAR_SHARED_SERVER=0 falls back to per-process correctly

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Shared-server mode to reuse a single headless server for MCP sessions; background headless execution with immediate MCP responses
- Idle watchdog to auto-terminate idle processes with per-mode controls
- Input validation with structured errors and model-alias suggestions
- Session locking to serialize start/continue/resume operations
Improvements
- Better session lifecycle: liveness detection, stale-lock cleanup, eviction and graceful shutdown
- Longer evaluation timeout and a monitoring script for process/memory snapshots
Documentation
- Added process lifecycle, configuration, architecture, plans, and troubleshooting entries
Tests
- Extensive tests for watchdogs, validators, session-lock, shared-server and related flows

…nv var resolution Implements the IdleWatchdog class (BUSY/IDLE state machine) as the foundation for process lifecycle management. Supports configurable timeouts per mode, env var overrides, stuck-stream force-recovery, and clean cancellation. 24 tests covering all paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Task 5 added pid: process.pid to createSessionMetadata(). The existing test expected pid to be undefined; now it verifies the correct value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When MCP handler pre-writes pid to metadata, createSessionMetadata should preserve it instead of overwriting with process.pid. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wraps sendPrompt() with watchdog.markBusy()/markIdle() in try/finally. Watchdog is optional - existing callers unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Creates watchdog on server start, touch() on poll loop, cancel on cleanup. Hoisted watchdog declaration to function scope for catch block access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rvisor Single shared OpenCode server for MCP sessions with per-session watchdogs, LRU eviction at max capacity, and auto-restart on crash (3 attempts/5 min). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Integrates SharedServerManager into mcp-server.js so that sidecar_start uses a single multiplexed OpenCode server (when SIDECAR_SHARED_SERVER != '0') instead of spawning an unref'd child process per request. Adds SIGTERM/SIGINT cleanup handlers to shut down the shared server on exit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ntinue - resume.js: checkSessionLiveness() for dead-process detection, acquireLock/releaseLock - start.js: acquireLock on session start, releaseLock in finally - continue.js: lock previous session dir to prevent concurrent continues Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Spawns real MCP server, fires 3 concurrent Gemini sessions, monitors RSS, verifies single shared server process, checks cleanup on exit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use sendPromptAsync with proper parts/model/agent format matching the headless path. Relax e2e process count assertion to log-only. Known issue: shared server path needs polling/finalization loop to match the per-process spawn path's behavior (status, progress, conversation tracking). Sessions start but never complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reuse runHeadless() from shared server MCP path by adding guard points for external client/server/watchdog. Fire-and-forget async pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix 5 issues: waitForServer skip, watchdog process.exit guard, third server.close location, session ownership, metadata ownership. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Critical fixes from external model reviews: - Add finalizeSession() call in .then() block - Add context building via buildContext() for parent conversation - Accept result param in .then() for summary access - Guard all 4 server.close() locations (including health check) - Document process.exit() poison pill risk Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

4 tasks: guard points in runHeadless, MCP handler rewrite, test suite, e2e verification. All 5 server.close() locations guarded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d finalization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…er support 4 guard points: skip server startup/shutdown when external client provided, watchdog passthrough, sessionId passthrough. All 5 server.close() locations guarded with !externalServer check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wraps eval runner with PID/RSS snapshots before, during (every 10s), and after eval execution. Detects orphaned processes and reports delta. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sidecar evals need ~7 min for full round-trip: Claude initialization, sidecar_start, LLM processing, polling, read, and applying fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

'gemini' was passed as raw alias to sendPromptAsync, which sent it as { providerID: 'openrouter', modelID: 'gemini' } - an invalid model. Now resolves via tryResolveModel() to full model ID before sending. Root cause of sessions stuck at 'messages: 0' in eval runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Validates model aliases, prompts, timeouts, and agent/headless compatibility at MCP boundary before session creation. Returns structured error responses for LLM self-correction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New input-validators.js validates model (resolves aliases), prompt (non-empty), timeout (positive, max 60 min), and agent/headless compatibility before any session creation. Returns structured JSON errors with field, message, suggestions for LLM self-correction. Replaces silent failures with immediate, actionable error feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verify error objects are JSON-serializable, include available aliases, and contain the invalid input in the message for LLM debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MCP Zod schema defaults agent to 'Chat'. The handler converts Chat to Build for headless mode. Validation must not reject this case since it's the normal default flow, not an explicit user error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-14T18:15:44Z

📝 Walkthrough

Walkthrough

Adds process lifecycle management: IdleWatchdog, session locking, input validation, and a SharedServerManager; integrates these into headless and MCP flows (fire-and-forget runHeadless for shared-server), updates sidecar start/resume/continue/crash handling, expands docs, and adds extensive tests and monitoring scripts.

Changes

Cohort / File(s)	Summary
Core lifecycle utils `src/utils/idle-watchdog.js`, `src/utils/session-lock.js`, `src/utils/shared-server.js`	New public utilities: IdleWatchdog (BUSY/IDLE timers, touch/stuck protection), atomic session lock APIs with stale-lock/pid liveness checks, and SharedServerManager (lazy server start, per-session watchdogs, restart/backoff, session tracking). Review timers, file IO, and restart/backoff logic.
Input validation `src/utils/input-validators.js`, `src/utils/validators.js`	New validateStartInputs and findSimilar with structured validation_error responses; re-exported from `src/utils/validators.js`. Check lazy loading to avoid circular deps and error object shape.
Headless runtime changes `src/headless.js`, `src/opencode-client.js`	runHeadless gains external-server options (client/server/sessionId/watchdog), integrates IdleWatchdog and conditional server lifecycle; opencode-client marksBusy/markIdle around prompt calls. Inspect external-server guard branches and watchdog cancel/cleanup.
MCP server & shared-server path `src/mcp-server.js`	MCP now validates inputs up-front, can ensure a shared server via SharedServerManager, writes initial session metadata, and invokes runHeadless fire-and-forget for no-UI shared-server sessions, with then/catch to finalize metadata. Verify metadata lifecycle, eviction handlers, and signal shutdown handling.
Sidecar session flow changes `src/sidecar/start.js`, `src/sidecar/resume.js`, `src/sidecar/continue.js`, `src/sidecar/crash-handler.js`, `src/sidecar/interactive.js`, `src/sidecar/session-utils.js`	Session lock acquire/release added around start/resume/continue; resume acquires lock, checks liveness, uses heartbeat; interactive/headless integrate IdleWatchdog; crash-handler removes session.lock. Review lock acquisition ordering and heartbeat/lock invariants.
Exports / public indexing `src/utils/...`, `src/utils/validators.js`	New modules added to public exports (idle-watchdog, input-validators, session-lock, shared-server); validators re-export new functions. Confirm public API table/documentation updates reflect these exports.
Docs & plans `docs/architecture.md`, `docs/configuration.md`, `docs/usage.md`, `docs/troubleshooting.md`, `docs/superpowers/plans/...`, `docs/superpowers/specs/...`	Adds Process Lifecycle, Shared Server Architecture, IdleWatchdog state machine, MCP polling/handoff and input-validation design, environment variables, and troubleshooting guidance. Verify accuracy vs. implementation.
Tests `tests/idle-watchdog.test.js`, `tests/input-validation.test.js`, `tests/session-lock.test.js`, `tests/shared-server.test.js`, `tests/...`	Extensive unit/integration/e2e tests for watchdogs, validators, session-lock semantics, SharedServerManager behavior, headless external-server guards, and shared-server E2E with memory monitoring. Review test coverage and mocks.
Scripts & eval `scripts/eval-with-monitoring.sh`, `evals/run_eval.js`	New eval monitoring script and extended eval timeout (60s→600s). Check script robustness and signal handling.
Minor test updates `tests/sidecar/start.test.js`	Adjusted expectation: createSessionMetadata now records current process PID by default. Confirm test alignment with metadata changes.

Sequence Diagram(s)

sequenceDiagram
    participant MCP as MCP Handler
    participant Val as Input Validator
    participant SVR as SharedServerManager
    participant HLS as runHeadless
    participant WDG as IdleWatchdog
    participant Meta as Metadata Writer
    participant Fin as finalizeSession

    MCP->>Val: validateStartInputs(input)
    Val-->>MCP: resolvedModel / error
    alt validation fails
        MCP-->>MCP: return validation error
    else
        MCP->>SVR: ensureServer()
        SVR-->>MCP: {server, client}
        MCP->>Meta: write initial session metadata
        MCP->>HLS: runHeadless({client, server, sessionId, watchdog})
        HLS->>WDG: start()
        loop runtime / polling
            HLS->>WDG: touch() / markBusy()/markIdle()
        end
        HLS->>Fin: on completion/error
        Fin->>Meta: finalize metadata (status/timestamp)
        MCP-->>MCP: return immediate response to caller
    end

sequenceDiagram
    participant App as Sidecar App
    participant Lock as session-lock
    participant Proc as OS Process
    participant WDG as IdleWatchdog

    App->>Lock: acquireLock(sessionDir, mode)
    alt lock exists
        Lock->>Proc: isPidAlive(pid) / check age
        alt stale
            Lock->>Lock: remove stale lock
            Lock->>Lock: create new lock
        else
            Lock-->>App: error (session active)
        end
    else
        Lock->>Lock: create lock atomically
    end
    App->>WDG: new IdleWatchdog(mode)
    loop operations
        App->>WDG: touch() / markBusy()/markIdle()
    end
    App->>Lock: releaseLock(sessionDir)
    Lock->>Lock: delete lock file

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I dug a tiny tunnel for timers and locks,

Watchdogs keep watch while servers unbox.
Sessions rest safe, no stale-file surprise,
Shared servers hum beneath open skies.
Hoppy deploys — the burrow now wise.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'fix: process lifecycle management and memory leak prevention' clearly summarizes the main changes—adding lifecycle management via IdleWatchdog, SharedServerManager, and session locks to prevent memory leaks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/memory-leak

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/sidecar/continue.js (1)
159-160: ⚠️ Potential issue | 🟡 Minor

createHeartbeat called without sessionDir argument — missing progress tracking.

The function signature is createHeartbeat(interval = HEARTBEAT_INTERVAL, sessionDir). While interval has a default, sessionDir does not. Calling createHeartbeat() without arguments will work but skip progress tracking (the function conditionally reads progress only when sessionDir is provided). In start.js (line 176), it's called with both arguments: createHeartbeat(HEARTBEAT_INTERVAL, sessDir). For consistency and to enable progress tracking here, import HEARTBEAT_INTERVAL and pass sessionDir:
Proposed fix
+const { HEARTBEAT_INTERVAL } = require('./session-utils');
 // ... in continueSidecar function
-  const heartbeat = createHeartbeat();
+  const heartbeat = createHeartbeat(HEARTBEAT_INTERVAL, sessionDir);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sidecar/continue.js` around lines 159 - 160, The heartbeat is created
without the required sessionDir so progress tracking is skipped; update the call
to createHeartbeat to pass the interval and session dir (use HEARTBEAT_INTERVAL
and the current sessionDir variable) and ensure HEARTBEAT_INTERVAL is imported
at top of the file; specifically change the call from createHeartbeat() to
createHeartbeat(HEARTBEAT_INTERVAL, sessionDir) and add the HEARTBEAT_INTERVAL
import so createHeartbeat(sessionDir)-dependent progress reads run.
src/headless.js (1)

257-456: 🛠️ Refactor suggestion | 🟠 Major

Use session.status() instead of extending this custom completion poller.

The new abort/watchdog logic adds more lifecycle behavior to the message loop, but this file's repo rule is to use the session status API for completion detection. Moving completion/stall checks there should make this path simpler and less divergence-prone. As per coding guidelines, "Use session.status() API for completion detection in headless mode instead of custom polling loops".

🧹 Nitpick comments (14)

tests/headless-watchdog.test.js (1)

7-13: Prefer behavior assertions over source-text string checks.

This test can pass even if watchdog lifecycle wiring is broken at runtime. Consider asserting runtime interactions (watchdog creation/touch/cancel) via module-level mocks instead of toContain(...) on file contents.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/headless-watchdog.test.js` around lines 7 - 13, Replace brittle
source-text checks against src/headless.js with runtime behavior assertions:
mock the watchdog module at the module level (use jest.mock for the module that
exports the watchdog used by headless.js), require/import the headless module
under test so it uses the mock, and then assert that the watchdog
factory/creation was called and that the mock's touch() and cancel() methods
were invoked as expected; target symbols to change are the test file
tests/headless-watchdog.test.js and the runtime interactions watchdog.touch(),
watchdog.cancel(), and the module that constructs the watchdog in headless.js so
the test verifies calls rather than string presence.

tests/interactive-watchdog.test.js (1)

7-12: This test is also implementation-coupled via string matching.

Using runtime assertions for watchdog setup/cleanup would give stronger regression protection than checking source text fragments.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/interactive-watchdog.test.js` around lines 7 - 12, Replace fragile
source-text assertions in tests/interactive-watchdog.test.js with runtime checks
against the module behavior: require or import the module exported by
src/sidecar/interactive.js (e.g., the initializer like initInteractiveMode or
exported watchdog instance), invoke the interactive startup function to create
the watchdog, and assert the watchdog object exists and has the expected
properties/methods (e.g., a running flag and a cancel method) and that calling
the cleanup path calls watchdog.cancel or removes listeners; remove assertions
that search for 'idle-watchdog', "mode: 'interactive'", or string
'watchdog.cancel()' and instead assert on the actual runtime effects (watchdog
presence, cancel invoked, and any listener counts) using spies/mocks where
needed.

src/sidecar/crash-handler.js (1)

42-48: Reuse session-lock utility instead of hardcoding lock cleanup.

This duplicates lock-file logic and filename ownership in two places.

♻️ Proposed refactor

 const fs = require('fs');
-const path = require('path');
+const { releaseLock } = require('../utils/session-lock');
 const { SessionPaths } = require('./session-utils');
@@
-      // Delete session lock if it exists
-      const lockPath = path.join(sessionDir, 'session.lock');
-      try {
-        fs.unlinkSync(lockPath);
-      } catch {
-        // Lock may not exist yet
-      }
+      // Delete session lock if it exists
+      releaseLock(sessionDir);

As per coding guidelines: "Extract duplicate logic to shared utilities."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/sidecar/crash-handler.js` around lines 42 - 48, Replace the hardcoded
unlink logic that builds lockPath and swallows errors with the shared
session-lock utility: import the utility used elsewhere (the session-lock
module) and call its exported cleanup/release function (instead of manually
creating lockPath from sessionDir and calling fs.unlinkSync in the try/catch) to
remove the session lock; reference the existing variables sessionDir and the
current unlink block so you replace that exact code with a call to the
session-lock cleanup API (e.g., cleanupSessionLock or removeLock) to centralize
lock-file logic.

src/sidecar/interactive.js (1)

153-160: Watchdog integration is functional but could be cleaner.

The initial onTimeout (lines 157-159) is immediately overwritten (lines 191-196) once electronProcess is available. This works but is slightly awkward.

Optional: Start watchdog after spawn for cleaner code

-  // Start idle watchdog for interactive mode (60-min default timeout)
-  const { IdleWatchdog } = require('../utils/idle-watchdog');
-  const watchdog = new IdleWatchdog({
-    mode: 'interactive',
-    onTimeout: () => {
-      logger.info('Interactive idle timeout - shutting down', { taskId });
-    },
-  }).start();
-
   return new Promise((resolve, _reject) => {
     const electronPath = getElectronPath();
     // ... spawn electronProcess ...

+    // Start idle watchdog after spawn so electronProcess is available
+    const { IdleWatchdog } = require('../utils/idle-watchdog');
+    const watchdog = new IdleWatchdog({
+      mode: 'interactive',
+      onTimeout: () => {
+        logger.info('Interactive idle timeout - shutting down', { taskId });
+        if (!electronProcess.killed) {
+          electronProcess.kill('SIGTERM');
+        }
+      },
+    }).start();

-    // Update watchdog onTimeout now that electronProcess is available
-    watchdog.onTimeout = () => {
-      logger.info('Interactive idle timeout - shutting down', { taskId });
-      if (!electronProcess.killed) {
-        electronProcess.kill('SIGTERM');
-      }
-    };

Also applies to: 191-196

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/sidecar/interactive.js` around lines 153 - 160, The IdleWatchdog is
created with an onTimeout handler that is immediately replaced later when
electronProcess exists; instead, instantiate/start the watchdog after spawning
electron so you can set the final onTimeout once electronProcess is available.
Locate the IdleWatchdog usage (symbols: IdleWatchdog, watchdog, start(),
onTimeout, electronProcess) and move its creation/start to the block that runs
after electronProcess is spawned (or construct it without onTimeout and call
watchdog.setOnTimeout(...) once electronProcess is assigned) so the handler is
only defined once and not overwritten.

src/opencode-client.js (1)

1-574: File exceeds 300-line limit.

This file is 574 lines, nearly double the 300-line limit specified in the coding guidelines. Consider splitting into separate modules (e.g., opencode-client.js for client operations, opencode-server.js for server management, mcp-config.js for MCP utilities).

As per coding guidelines: src/**/*.js: Any file must not exceed 300 lines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/opencode-client.js` around lines 1 - 574, The file is too large (574
lines) and should be split; move client-related functions (createClient,
createSession, createChildSession, sendPrompt, getMessages, getChildren,
listSessions, getSessionStatus, abortSession, checkHealth, parseModelString)
into a new opencode-client.js module, move server management functions
(buildServerOptions, startServer, getCreateOpencodeServer,
getCreateOpencodeClient, getSDK) into opencode-server.js, and move MCP utilities
(loadMcpConfig, parseMcpSpec and any MCP-normalization logic) into
mcp-config.js; update each module to export its symbols and change the original
module.exports to re-export or require the new modules, and adjust any
require/imports (e.g., references to buildProviderModels,
buildCoworkAgentPrompt, logger) so they point to the new files to keep behavior
unchanged.

docs/superpowers/plans/2026-03-14-mcp-polling-handoff.md (1)

4-6: Consider removing developer-specific local path.

Line 5 contains a local worktree path (/Users/john_renaldi/claude-code-projects/sidecar-memory-leak) which is specific to one developer's machine. This may be confusing for other contributors. Consider removing or generalizing this path.
✏️ Suggested change
 **Date:** 2026-03-14
 **Branch:** feature/memory-leak
-**Worktree:** /Users/john_renaldi/claude-code-projects/sidecar-memory-leak
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-14-mcp-polling-handoff.md` around lines 4 - 6,
The file contains a developer-specific local path on the "Worktree" line
(/Users/john_renaldi/claude-code-projects/sidecar-memory-leak); remove or
replace that path with a generic value (e.g., "./worktree" or omit the Worktree
line entirely) so the docs don't expose a personal machine path. Edit the
document entry that contains the "Worktree:" label to either remove the absolute
path or substitute a neutral placeholder and commit the change.

tests/process-lifecycle.test.js (1)

51-60: Source inspection test is brittle.

This test reads crash-handler.js source and asserts string presence. If the implementation changes (e.g., uses fs.promises.unlink instead of unlinkSync, or renames the lock file), this test breaks even if behavior is correct.

Consider replacing with an integration test that actually triggers crash handler logic and verifies the lock file is removed, or accept this as a lightweight smoke test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/process-lifecycle.test.js` around lines 51 - 60, The test inspects the
crash-handler.js source for the literal strings 'session.lock' and 'unlinkSync',
which is brittle; update the test to exercise the crash handler behavior
instead: create a temporary session.lock file, invoke the crash handler entry
point (the module in crash-handler.js or the exported function that performs
cleanup), trigger the cleanup flow (or call the cleanup function directly), then
assert the file no longer exists; alternatively, if keeping a source-level smoke
test, relax it to check for either 'unlinkSync' or 'fs.promises.unlink' and for
any reference to 'lock' rather than the exact 'session.lock'.

src/sidecar/resume.js (1)

114-206: Function resumeSidecar exceeds 50-line limit.

At ~93 lines, this function exceeds the 50-line guideline. Consider extracting the headless and interactive branches into separate helper functions (e.g., runHeadlessResume, runInteractiveResume) to improve readability and testability.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/sidecar/resume.js` around lines 114 - 206, The resumeSidecar function is
too long; extract the headless branch and the interactive branch into two
helpers (e.g., runHeadlessResume and runInteractiveResume) so resumeSidecar only
orchestrates flow: create sessionDir/metadata, acquireLock, compute
resumePrompt, start heartbeat, call the appropriate helper, then stop heartbeat,
releaseLock, outputSummary, and finalizeSession. The helpers should accept the
same context used in the current branches (at minimum: metadata.model,
resumePrompt, taskId, project, timeout/effectiveAgent, existingConversation, and
mcpServers) and return a result object with summary, timedOut, and error fields
so existing post-call logging and summary handling in resumeSidecar remains
unchanged; ensure updatedMetadata is still used for finalizeSession and keep
heartbeat.stop() and releaseLock(sessionDir) inside the existing finally block.

tests/idle-watchdog.test.js (1)

236-246: Replace the source-text assertion with a behavioral test.

This only proves the file contains those tokens. Dead code still passes, and harmless renames fail. Stub the async prompt path and assert the watchdog transitions around it instead of grepping the source.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/idle-watchdog.test.js` around lines 236 - 246, Replace the brittle
source-text grep in the existing test with a behavioral test that stubs the
async prompt path and asserts watchdog transitions; require the module under
test (opencode-client.js), spyOn or mock the internal async prompt function that
sendPrompt wraps (the function sendPrompt calls), then call sendPrompt and
assert watchdog.markBusy was called before the async promise resolves and
watchdog.markIdle was called after it resolves (also add a test for the
rejection path to ensure markIdle is still called on error). Use
jest.fn()/jest.spyOn on the watchdog object and the internal prompt function to
control timing and verify call order rather than checking file contents.

tests/input-validation.test.js (1)

46-55: Make the default-model test deterministic.

This branch accepts both success and failure based on whatever config happens to exist on the machine, so it won't catch regressions in default resolution. Isolate the config state for the test and assert one expected outcome.

src/utils/shared-server.js (1)

27-27: Environment variable parsing may not behave as intended for value "0".

Number(process.env.SIDECAR_MAX_SESSIONS) || 20 will fall back to 20 if the env var is set to "0", since Number("0") is falsy. While 0 max sessions is likely an invalid configuration, this could cause confusion.

Consider using nullish coalescing if you want to respect explicit "0":
♻️ Suggested fix
-    this.maxSessions = Number(process.env.SIDECAR_MAX_SESSIONS) || 20;
+    const envMax = process.env.SIDECAR_MAX_SESSIONS;
+    this.maxSessions = envMax !== undefined ? Number(envMax) : 20;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/shared-server.js` at line 27, The assignment to this.maxSessions
uses a truthy fallback that treats "0" as absent; update the parsing so an
explicit "0" is respected by using nullish coalescing or an explicit check:
parse process.env.SIDECAR_MAX_SESSIONS with Number(...) (or parseInt) into a
temp value and set this.maxSessions = tempValue ?? 20 (or set to 20 only when
env is undefined/null), referencing the existing this.maxSessions assignment in
shared-server.js to locate the change.

docs/superpowers/specs/2026-03-14-mcp-input-validation-design.md (1)

83-97: Spec differs from actual implementation for agent validation.

The spec shows:
if (input.noUi) {
  const agentResult = validateHeadlessAgent(input.agent);
But the actual implementation in src/utils/input-validators.js (lines 107-122 per context snippet) has additional logic:
if (input.noUi && input.agent) {
  const lower = input.agent.toLowerCase();
  if (lower !== 'chat') {  // Skip for chat - handler auto-converts
    const agentResult = validateHeadlessAgent(input.agent);
The implementation correctly handles the MCP Zod schema default where agent defaults to 'Chat', and the handler auto-converts Chat→Build for headless mode. Consider updating the spec to document this edge case for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-14-mcp-input-validation-design.md` around
lines 83 - 97, Update the spec's agent validation section to match
implementation: only call validateHeadlessAgent when noUi is true AND
input.agent is present and not the default/auto-converted Chat; specifically
document that the handler auto-converts "Chat"→"Build" for headless mode so the
code uses the check if (input.noUi && input.agent) { const lower =
input.agent.toLowerCase(); if (lower !== 'chat') { const agentResult =
validateHeadlessAgent(input.agent); ... } } and therefore the spec should
include this conditional and the rationale.

docs/superpowers/plans/2026-03-14-mcp-polling-plan.md (1)

36-66: Source-check tests provide good safety net but are brittle.

The test strategy of checking for string presence in source code (e.g., expect(src).toContain('externalServer')) is a lightweight way to verify implementation progress. However, these tests are fragile—renaming variables or refactoring would break them without affecting functionality.

Consider supplementing with actual behavioral tests once the implementation is complete.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-14-mcp-polling-plan.md` around lines 36 - 66,
The source-check tests in tests/headless-external-server.test.js are brittle
because they assert string occurrences (e.g., 'externalServer',
'!externalServer', 'options.sessionId', 'options.watchdog') in the raw src read
from src/headless.js; replace or supplement these with behavioral/unit tests
that import or require the headless module and assert runtime behavior instead
(for example, call the exported function(s) with an options object containing
externalServer/sessionId/watchdog and verify server.close() is skipped when
externalServer is true and that sessionId/watchdog are accepted), keeping the
existing source-checks only as a temporary guard if needed.

docs/superpowers/specs/2026-03-14-mcp-polling-integration-design.md (1)

113-125: Minor inconsistency: buildContext call uses await here but not in the plan file.

This spec shows context = await buildContext({...}) while the plan file (2026-03-14-mcp-polling-plan.md, lines 270-278) shows synchronous usage context = buildContext(cwd, ...).

Additionally, the parameter structure differs:

Spec: buildContext({ project: cwd, parentSession: ... })

Plan: buildContext(cwd, input.parentSession, { ... })

Verify the actual function signature and update both documents for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-14-mcp-polling-integration-design.md` around
lines 113 - 125, The two docs are inconsistent about buildContext: confirm the
real function signature and whether it returns a Promise (check the
implementation of buildContext), then update both documents to match that
signature and usage; if buildContext is async, change the plan file to use
"await buildContext(...)" (or if synchronous, remove await in the spec), and
standardize the parameter shape so both show the same call form (either
buildContext({project: cwd, parentSession, coworkProcess, contextTurns,
contextSince, contextMaxTokens}) or buildContext(cwd, input.parentSession,
{coworkProcess, contextTurns, contextSince, contextMaxTokens})) so examples and
parameter names (buildContext) are identical across files.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/troubleshooting.md`:
- Line 7: The docs show the wrong session lock path; update the troubleshooting
entry to match SessionPaths.sessionDir() usage by replacing
~/.config/sidecar/sessions/<task_id>/session.lock with the project-local path
where sessions are stored:
<project>/.claude/sidecar_sessions/<task_id>/session.lock (i.e., instruct users
to delete <project>/.claude/sidecar_sessions/<task_id>/session.lock to clear a
stale lock).

In `@scripts/eval-with-monitoring.sh`:
- Around line 66-67: The script currently exits immediately on a non-zero exit
from the node command due to shell errexit (set -e), so EVAL_EXIT is never
captured and monitoring/cleanup never runs; change the block around the node
invocation to disable errexit just for that run (e.g., use set +e or run the
node command with a shell-safe capture pattern) so you can capture the node
process exit into the EVAL_EXIT variable (the symbol EVAL_EXIT) and then
re-enable errexit (set -e) before continuing to the existing monitoring/cleanup
logic; update the area around the node "$PROJECT_DIR/evals/run_eval.js"
invocation so failures do not short-circuit post-eval monitoring.

In `@src/headless.js`:
- Around line 161-173: The IdleWatchdog is started before session creation and
is not cancelled on the early-return error path in createSession(), leaving a
timer that can later call process.exit(0); update the createSession()
early-return handling (the block at lines ~181-191) to stop the watchdog when
aborting: reference the watchdog variable and call its stop/cancel method (e.g.,
watchdog.stop() or watchdog.cancel()) if it exists, or alternatively defer
starting the IdleWatchdog until after successful session creation (instantiate
IdleWatchdog only after createSession succeeds); ensure IdleWatchdog is
referenced by name and that any started instance is properly stopped on all
error/return paths.

In `@src/mcp-server.js`:
- Around line 127-134: The metadata currently records the long-lived MCP process
with "pid: process.pid", which causes sidecar_abort() to kill the whole server;
change the recorded task PID to the actual task/sidecar process ID instead. In
the JSON written to metaPath (the object created where taskId, status,
opencodeSessionId, opencodePort, goPid are set), replace pid: process.pid with
the task PID (e.g. pid: server.goPid || null or whatever child/sidecar PID
property is used by the server instance) and keep goPid as-is; ensure
sidecar_abort() reads that same pid field so it only signals the task process,
not the MCP server.
- Around line 121-134: You create a shared session (createSession -> sessionId)
and write metadata.json before calling addSession()/runHeadless, so if
addSession() or later fails you must clean up the partially-created shared
session and metadata instead of falling back to spawn. Wrap the
addSession()/runHeadless sequence in a try/catch; on error call the
corresponding session teardown on the client (e.g. deleteSession(sessionId) or
closeSession(sessionId) / client.removeSession(sessionId) — use the existing
session-management API in your codebase — and remove the metadata
file/sessionDir (fs.unlinkSync or fs.rmSync) before rethrowing the error so the
code does not continue to the spawn path and leaves stale metadata.json or a
leaked shared session. Ensure this same cleanup is applied to the other affected
block (lines ~156-208) where createSession is used.

In `@src/sidecar/resume.js`:
- Around line 137-140: The lock acquired by acquireLock(...) is taken before the
try/finally that releases it, so if buildMcpConfig, checkFileDrift,
updateSessionStatus, createHeartbeat or any other call between acquireLock and
the try throws the lock will never be released; move the call to
acquireLock(sessionDir, headless ? 'headless' : 'interactive') inside the try
block (or restructure so the try begins immediately after acquireLock) so the
finally block that releases the lock always runs, ensuring locks are released
even on intermediate failures.

In `@src/sidecar/session-utils.js`:
- Around line 248-250: checkSessionLiveness can throw when metadata is
null/undefined; update the function to first guard for a missing metadata object
(or missing pid/goPid) and return 'dead' immediately instead of calling
isProcessAlive on undefined. Specifically, in checkSessionLiveness add a
null/undefined check for metadata (and optionally for
metadata.pid/metadata.goPid) before invoking isProcessAlive so the function
safely returns 'dead' when metadata is absent.

In `@src/utils/input-validators.js`:
- Around line 55-67: The validation error currently interpolates input.model
into the message even when it's undefined; update the model-error branch in the
validation logic (the block checking modelError and using getEffectiveAliases(),
findSimilar(), and input.model) to only prepend the "Model 'X' not found." text
when a model string was actually supplied (e.g., typeof input.model === 'string'
&& input.model !== ''), otherwise use just the tryResolveModel() message;
likewise only compute suggestions/aliases when a model string exists (omit or
set suggestions to []/undefined when no model was provided) so the error message
no longer shows "Model 'undefined' not found."

In `@src/utils/session-lock.js`:
- Around line 29-39: isLockStale currently treats any lock as reclaimable after
a fixed age even if lockData.pid is alive, which lets a long-running legitimate
process have its lock stolen; change isLockStale to only consider a lock stale
when the PID is not alive (i.e., if !isPidAlive(lockData.pid) return true,
otherwise ignore the timestamp/age check), and do not use
MODE_TIMEOUTS/lockData.timestamp to reclaim live PIDs unless you implement a
heartbeat or owner token; additionally update releaseLock() to verify ownership
(e.g., require matching lockData.ownerToken or pid before deleting) so a new
owner cannot be removed by the original process on exit.

In `@tests/shared-server-e2e.integration.test.js`:
- Around line 109-112: The spawned MCP process inherits SIDECAR_SHARED_SERVER
from the parent env, causing the test to miss the shared-server path when
overridden; modify the spawn call (where NODE, SIDECAR_BIN and the 'mcp' arg are
used) to force-enable the flag by adding SIDECAR_SHARED_SERVER: '1' into the env
object (e.g., spread process.env then set SIDECAR_SHARED_SERVER: '1') so the
child always runs with shared-server enabled for this test.
- Around line 38-49: MemoryMonitor currently samples the wrong process
(client.child.pid) and misses the real OpenCode process; update MemoryMonitor so
its start/_record logic resolves and samples the actual Go child PID stored in
the session metadata (key "goPid") when present, falling back to
client.child.pid if not available, and ensure any other sampling locations
referenced around lines 235-236 use the same goPid-aware logic; modify
MemoryMonitor constructor/start/_record to accept or look up the session
metadata and use that PID for RSS measurements.
- Around line 318-333: Replace the weak assertions that only check non-empty
output with two strict checks: assert each pollResults[i].status is exactly
'complete' (not just in ['complete','error']) and assert the LLM output from the
sidecar_read call contains the deterministic marker `SESSION_${i}_OK`; locate
the loop using SESSION_COUNT that checks pollResults and the subsequent loop
calling client.request('tools/call', { name: 'sidecar_read', arguments: {
taskId: taskIds[i], project: tmpDir } }) and replace the expect calls so they
use expect(pollResults[i].status).toBe('complete') and
expect(readText).toContain(`SESSION_${i}_OK`) (use taskIds[i] where appropriate
for logging).

---

Outside diff comments:
In `@src/sidecar/continue.js`:
- Around line 159-160: The heartbeat is created without the required sessionDir
so progress tracking is skipped; update the call to createHeartbeat to pass the
interval and session dir (use HEARTBEAT_INTERVAL and the current sessionDir
variable) and ensure HEARTBEAT_INTERVAL is imported at top of the file;
specifically change the call from createHeartbeat() to
createHeartbeat(HEARTBEAT_INTERVAL, sessionDir) and add the HEARTBEAT_INTERVAL
import so createHeartbeat(sessionDir)-dependent progress reads run.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-03-14-mcp-polling-handoff.md`:
- Around line 4-6: The file contains a developer-specific local path on the
"Worktree" line (/Users/john_renaldi/claude-code-projects/sidecar-memory-leak);
remove or replace that path with a generic value (e.g., "./worktree" or omit the
Worktree line entirely) so the docs don't expose a personal machine path. Edit
the document entry that contains the "Worktree:" label to either remove the
absolute path or substitute a neutral placeholder and commit the change.

In `@docs/superpowers/plans/2026-03-14-mcp-polling-plan.md`:
- Around line 36-66: The source-check tests in
tests/headless-external-server.test.js are brittle because they assert string
occurrences (e.g., 'externalServer', '!externalServer', 'options.sessionId',
'options.watchdog') in the raw src read from src/headless.js; replace or
supplement these with behavioral/unit tests that import or require the headless
module and assert runtime behavior instead (for example, call the exported
function(s) with an options object containing externalServer/sessionId/watchdog
and verify server.close() is skipped when externalServer is true and that
sessionId/watchdog are accepted), keeping the existing source-checks only as a
temporary guard if needed.

In `@docs/superpowers/specs/2026-03-14-mcp-input-validation-design.md`:
- Around line 83-97: Update the spec's agent validation section to match
implementation: only call validateHeadlessAgent when noUi is true AND
input.agent is present and not the default/auto-converted Chat; specifically
document that the handler auto-converts "Chat"→"Build" for headless mode so the
code uses the check if (input.noUi && input.agent) { const lower =
input.agent.toLowerCase(); if (lower !== 'chat') { const agentResult =
validateHeadlessAgent(input.agent); ... } } and therefore the spec should
include this conditional and the rationale.

In `@docs/superpowers/specs/2026-03-14-mcp-polling-integration-design.md`:
- Around line 113-125: The two docs are inconsistent about buildContext: confirm
the real function signature and whether it returns a Promise (check the
implementation of buildContext), then update both documents to match that
signature and usage; if buildContext is async, change the plan file to use
"await buildContext(...)" (or if synchronous, remove await in the spec), and
standardize the parameter shape so both show the same call form (either
buildContext({project: cwd, parentSession, coworkProcess, contextTurns,
contextSince, contextMaxTokens}) or buildContext(cwd, input.parentSession,
{coworkProcess, contextTurns, contextSince, contextMaxTokens})) so examples and
parameter names (buildContext) are identical across files.

In `@src/opencode-client.js`:
- Around line 1-574: The file is too large (574 lines) and should be split; move
client-related functions (createClient, createSession, createChildSession,
sendPrompt, getMessages, getChildren, listSessions, getSessionStatus,
abortSession, checkHealth, parseModelString) into a new opencode-client.js
module, move server management functions (buildServerOptions, startServer,
getCreateOpencodeServer, getCreateOpencodeClient, getSDK) into
opencode-server.js, and move MCP utilities (loadMcpConfig, parseMcpSpec and any
MCP-normalization logic) into mcp-config.js; update each module to export its
symbols and change the original module.exports to re-export or require the new
modules, and adjust any require/imports (e.g., references to
buildProviderModels, buildCoworkAgentPrompt, logger) so they point to the new
files to keep behavior unchanged.

In `@src/sidecar/crash-handler.js`:
- Around line 42-48: Replace the hardcoded unlink logic that builds lockPath and
swallows errors with the shared session-lock utility: import the utility used
elsewhere (the session-lock module) and call its exported cleanup/release
function (instead of manually creating lockPath from sessionDir and calling
fs.unlinkSync in the try/catch) to remove the session lock; reference the
existing variables sessionDir and the current unlink block so you replace that
exact code with a call to the session-lock cleanup API (e.g., cleanupSessionLock
or removeLock) to centralize lock-file logic.

In `@src/sidecar/interactive.js`:
- Around line 153-160: The IdleWatchdog is created with an onTimeout handler
that is immediately replaced later when electronProcess exists; instead,
instantiate/start the watchdog after spawning electron so you can set the final
onTimeout once electronProcess is available. Locate the IdleWatchdog usage
(symbols: IdleWatchdog, watchdog, start(), onTimeout, electronProcess) and move
its creation/start to the block that runs after electronProcess is spawned (or
construct it without onTimeout and call watchdog.setOnTimeout(...) once
electronProcess is assigned) so the handler is only defined once and not
overwritten.

In `@src/sidecar/resume.js`:
- Around line 114-206: The resumeSidecar function is too long; extract the
headless branch and the interactive branch into two helpers (e.g.,
runHeadlessResume and runInteractiveResume) so resumeSidecar only orchestrates
flow: create sessionDir/metadata, acquireLock, compute resumePrompt, start
heartbeat, call the appropriate helper, then stop heartbeat, releaseLock,
outputSummary, and finalizeSession. The helpers should accept the same context
used in the current branches (at minimum: metadata.model, resumePrompt, taskId,
project, timeout/effectiveAgent, existingConversation, and mcpServers) and
return a result object with summary, timedOut, and error fields so existing
post-call logging and summary handling in resumeSidecar remains unchanged;
ensure updatedMetadata is still used for finalizeSession and keep
heartbeat.stop() and releaseLock(sessionDir) inside the existing finally block.

In `@src/utils/shared-server.js`:
- Line 27: The assignment to this.maxSessions uses a truthy fallback that treats
"0" as absent; update the parsing so an explicit "0" is respected by using
nullish coalescing or an explicit check: parse process.env.SIDECAR_MAX_SESSIONS
with Number(...) (or parseInt) into a temp value and set this.maxSessions =
tempValue ?? 20 (or set to 20 only when env is undefined/null), referencing the
existing this.maxSessions assignment in shared-server.js to locate the change.

In `@tests/headless-watchdog.test.js`:
- Around line 7-13: Replace brittle source-text checks against src/headless.js
with runtime behavior assertions: mock the watchdog module at the module level
(use jest.mock for the module that exports the watchdog used by headless.js),
require/import the headless module under test so it uses the mock, and then
assert that the watchdog factory/creation was called and that the mock's touch()
and cancel() methods were invoked as expected; target symbols to change are the
test file tests/headless-watchdog.test.js and the runtime interactions
watchdog.touch(), watchdog.cancel(), and the module that constructs the watchdog
in headless.js so the test verifies calls rather than string presence.

In `@tests/idle-watchdog.test.js`:
- Around line 236-246: Replace the brittle source-text grep in the existing test
with a behavioral test that stubs the async prompt path and asserts watchdog
transitions; require the module under test (opencode-client.js), spyOn or mock
the internal async prompt function that sendPrompt wraps (the function
sendPrompt calls), then call sendPrompt and assert watchdog.markBusy was called
before the async promise resolves and watchdog.markIdle was called after it
resolves (also add a test for the rejection path to ensure markIdle is still
called on error). Use jest.fn()/jest.spyOn on the watchdog object and the
internal prompt function to control timing and verify call order rather than
checking file contents.

In `@tests/interactive-watchdog.test.js`:
- Around line 7-12: Replace fragile source-text assertions in
tests/interactive-watchdog.test.js with runtime checks against the module
behavior: require or import the module exported by src/sidecar/interactive.js
(e.g., the initializer like initInteractiveMode or exported watchdog instance),
invoke the interactive startup function to create the watchdog, and assert the
watchdog object exists and has the expected properties/methods (e.g., a running
flag and a cancel method) and that calling the cleanup path calls
watchdog.cancel or removes listeners; remove assertions that search for
'idle-watchdog', "mode: 'interactive'", or string 'watchdog.cancel()' and
instead assert on the actual runtime effects (watchdog presence, cancel invoked,
and any listener counts) using spies/mocks where needed.

In `@tests/process-lifecycle.test.js`:
- Around line 51-60: The test inspects the crash-handler.js source for the
literal strings 'session.lock' and 'unlinkSync', which is brittle; update the
test to exercise the crash handler behavior instead: create a temporary
session.lock file, invoke the crash handler entry point (the module in
crash-handler.js or the exported function that performs cleanup), trigger the
cleanup flow (or call the cleanup function directly), then assert the file no
longer exists; alternatively, if keeping a source-level smoke test, relax it to
check for either 'unlinkSync' or 'fs.promises.unlink' and for any reference to
'lock' rather than the exact 'session.lock'.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 515b2e2f-d89e-4a9c-b4cc-e6c06026b698

📥 Commits

Reviewing files that changed from the base of the PR and between 251358e and 7ccf6fd.

📒 Files selected for processing (37)

CLAUDE.md
docs/architecture.md
docs/configuration.md
docs/superpowers/plans/2026-03-14-mcp-polling-handoff.md
docs/superpowers/plans/2026-03-14-mcp-polling-plan.md
docs/superpowers/specs/2026-03-14-mcp-input-validation-design.md
docs/superpowers/specs/2026-03-14-mcp-polling-integration-design.md
docs/troubleshooting.md
docs/usage.md
evals/run_eval.js
scripts/eval-with-monitoring.sh
src/headless.js
src/mcp-server.js
src/opencode-client.js
src/sidecar/continue.js
src/sidecar/crash-handler.js
src/sidecar/interactive.js
src/sidecar/resume.js
src/sidecar/session-utils.js
src/sidecar/start.js
src/utils/idle-watchdog.js
src/utils/input-validators.js
src/utils/session-lock.js
src/utils/shared-server.js
src/utils/validators.js
tests/headless-external-server.test.js
tests/headless-watchdog.test.js
tests/idle-watchdog.test.js
tests/input-validation.test.js
tests/interactive-watchdog.test.js
tests/mcp-shared-server.test.js
tests/process-lifecycle.test.js
tests/resume-lock.test.js
tests/session-lock.test.js
tests/shared-server-e2e.integration.test.js
tests/shared-server.test.js
tests/sidecar/start.test.js

docs/troubleshooting.md

scripts/eval-with-monitoring.sh

src/headless.js

coderabbitai · 2026-03-14T18:39:04Z

src/mcp-server.js

+        const sessionId = await createSession(client);
+
+        // Write initial metadata (MCP handler owns this, runHeadless skips it)
+        fs.mkdirSync(sessionDir, { recursive: true, mode: 0o700 });
+        const metaPath = path.join(sessionDir, 'metadata.json');
+        const serverPort = server.url ? new URL(server.url).port : null;
+        fs.writeFileSync(metaPath, JSON.stringify({
+          taskId, status: 'running', pid: process.pid,
+          opencodeSessionId: sessionId,
+          opencodePort: serverPort,
+          goPid: server.goPid || null,
+          createdAt: new Date().toISOString(),
+          headless: true, model: resolvedModel,
+        }, null, 2), { mode: 0o600 });


⚠️ Potential issue | 🔴 Critical

Don't fall back after partially creating a shared session.

createSession() and metadata.json are created before addSession() / runHeadless() complete. If addSession() throws (for example when the shared server hits its max-session guard) or anything later in this block fails, Lines 206-208 drop into the spawn path without aborting the shared session or clearing that metadata. That leaks the shared-server session and leaves stale ownership data behind for the spawned fallback task.

Also applies to: 156-208

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/mcp-server.js` around lines 121 - 134, You create a shared session (createSession -> sessionId) and write metadata.json before calling addSession()/runHeadless, so if addSession() or later fails you must clean up the partially-created shared session and metadata instead of falling back to spawn. Wrap the addSession()/runHeadless sequence in a try/catch; on error call the corresponding session teardown on the client (e.g. deleteSession(sessionId) or closeSession(sessionId) / client.removeSession(sessionId) — use the existing session-management API in your codebase — and remove the metadata file/sessionDir (fs.unlinkSync or fs.rmSync) before rethrowing the error so the code does not continue to the spawn path and leaves stale metadata.json or a leaked shared session. Ensure this same cleanup is applied to the other affected block (lines ~156-208) where createSession is used.

src/mcp-server.js

src/utils/input-validators.js

coderabbitai · 2026-03-14T18:39:05Z

src/utils/session-lock.js

+function isLockStale(lockData) {
+  if (!isPidAlive(lockData.pid)) {
+    return true;
+  }
+  const modeTimeout = MODE_TIMEOUTS[lockData.mode] || MODE_TIMEOUTS.headless;
+  const ageMs = Date.now() - new Date(lockData.timestamp).getTime();
+  if (ageMs > modeTimeout * 2) {
+    return true;
+  }
+  return false;
+}


⚠️ Potential issue | 🔴 Critical

A live lock becomes reclaimable after a fixed age.

The timestamp is only written once, so any legitimate run that exceeds modeTimeout * 2 can be stolen even while its PID is alive. Because releaseLock() doesn't verify ownership, the original process can then delete the new owner's lock on exit. Only treat dead PIDs as stale unless you also add a heartbeat/owner token.

Also applies to: 86-93

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/session-lock.js` around lines 29 - 39, isLockStale currently treats any lock as reclaimable after a fixed age even if lockData.pid is alive, which lets a long-running legitimate process have its lock stolen; change isLockStale to only consider a lock stale when the PID is not alive (i.e., if !isPidAlive(lockData.pid) return true, otherwise ignore the timestamp/age check), and do not use MODE_TIMEOUTS/lockData.timestamp to reclaim live PIDs unless you implement a heartbeat or owner token; additionally update releaseLock() to verify ownership (e.g., require matching lockData.ownerToken or pid before deleting) so a new owner cannot be removed by the original process on exit.

coderabbitai · 2026-03-14T18:39:05Z

tests/shared-server-e2e.integration.test.js

+class MemoryMonitor {
+  constructor(pid) {
+    this.pid = pid;
+    this.snapshots = [];
+    this._interval = null;
+  }
+
+  start(intervalMs = 2000) {
+    this._record(); // initial snapshot
+    this._interval = setInterval(() => this._record(), intervalMs);
+    return this;
+  }


⚠️ Potential issue | 🟠 Major

This RSS monitor misses the actual shared OpenCode process.

MemoryMonitor samples client.child.pid, which is just the Node MCP wrapper. The shared-server path also starts the real OpenCode child (goPid is written into session metadata), so this can report a healthy peak while the actual server leaks.

Also applies to: 235-236

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/shared-server-e2e.integration.test.js` around lines 38 - 49, MemoryMonitor currently samples the wrong process (client.child.pid) and misses the real OpenCode process; update MemoryMonitor so its start/_record logic resolves and samples the actual Go child PID stored in the session metadata (key "goPid") when present, falling back to client.child.pid if not available, and ensure any other sampling locations referenced around lines 235-236 use the same goPid-aware logic; modify MemoryMonitor constructor/start/_record to accept or look up the session metadata and use that PID for RSS measurements.

tests/shared-server-e2e.integration.test.js

coderabbitai · 2026-03-14T18:39:05Z

tests/shared-server-e2e.integration.test.js

+    // Verify all completed
+    for (let i = 0; i < SESSION_COUNT; i++) {
+      process.stderr.write(`  [e2e] Session ${taskIds[i]} final: ${pollResults[i].status}\n`);
+      expect(['complete', 'error']).toContain(pollResults[i].status);
+    }
+
+    // Step 5: Read results to verify LLM actually responded
+    for (let i = 0; i < SESSION_COUNT; i++) {
+      const readResult = await client.request('tools/call', {
+        name: 'sidecar_read',
+        arguments: { taskId: taskIds[i], project: tmpDir },
+      });
+      const readText = readResult.result.content[0].text;
+      process.stderr.write(`  [e2e] Session ${i} output length: ${readText.length} chars\n`);
+      expect(readText.length).toBeGreaterThan(0);
+    }


⚠️ Potential issue | 🟠 Major

Assert success and expected content, not just "non-empty".

This currently passes when sessions finish in error, and sidecar_read() returning No summary available... still satisfies the length check. Since the prompt is deterministic, assert status === 'complete' and that each result contains SESSION_${i}_OK.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/shared-server-e2e.integration.test.js` around lines 318 - 333, Replace the weak assertions that only check non-empty output with two strict checks: assert each pollResults[i].status is exactly 'complete' (not just in ['complete','error']) and assert the LLM output from the sidecar_read call contains the deterministic marker `SESSION_${i}_OK`; locate the loop using SESSION_COUNT that checks pollResults and the subsequent loop calling client.request('tools/call', { name: 'sidecar_read', arguments: { taskId: taskIds[i], project: tmpDir } }) and replace the expect calls so they use expect(pollResults[i].status).toBe('complete') and expect(readText).toContain(`SESSION_${i}_OK`) (use taskIds[i] where appropriate for logging).

…se in abort - Hoist sessionId declaration before the try block so catch can call sharedServer.removeSession(sessionId) to clean up any partially-created shared server session before falling through to the spawn path - Store pid: null (not process.pid) in metadata for shared server sessions; storing the MCP server's own PID caused sidecar_abort to SIGTERM the entire MCP process, killing all active sessions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- docs/troubleshooting.md: correct session.lock path from ~/.config/sidecar/sessions/ to <project>/.claude/sidecar_sessions/ - scripts/eval-with-monitoring.sh: wrap eval node call with set +e/set -e so post-eval snapshot and process delta reporting always run even when the eval exits non-zero - tests/shared-server-e2e.integration.test.js: force SIDECAR_SHARED_SERVER=1 in MCP server spawn env; strengthen sidecar_read assertions (length > 10, regex match on session marker) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…or message - headless.js: cancel watchdog on createSession error return path - resume.js: move outputSummary/finalizeSession inside try block, guard heartbeat.stop() for null - session-utils.js: guard checkSessionLiveness against null metadata - input-validators.js: show "No model specified" instead of "Model 'undefined'" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/sidecar/resume.js (1)
162-175: ⚠️ Potential issue | 🟠 Major

Force Build when resuming in headless mode.

A resumed session can carry forward chat from metadata and pass it into runHeadless(), which can stall headless execution. Normalize to Build for headless resumes.
🔧 Proposed fix
-    const effectiveAgent = metadata.agent || 'Build';
+    const previousAgent = metadata.agent || 'Build';
+    const effectiveAgent =
+      (headless && typeof previousAgent === 'string' && previousAgent.toLowerCase() === 'chat')
+        ? 'Build'
+        : previousAgent;
As per coding guidelines: "For headless mode in interactive scenarios, use Build agent instead of chat agent to avoid stalling".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sidecar/resume.js` around lines 162 - 175, The resumed headless flow may
carry a metadata.agent of "chat" causing runHeadless() to stall; update the
agent selection so that when headless is true you force the Build agent. Change
the effectiveAgent logic used before calling runHeadless (referencing
effectiveAgent, metadata.agent, headless, runHeadless, and
buildResumeUserMessage) so that effectiveAgent is set to 'Build' for headless
resumes (otherwise keep metadata.agent || 'Build'), and pass that forced value
into runHeadless.
src/headless.js (1)
248-457: 🛠️ Refactor suggestion | 🟠 Major

Migrate to session.status() API for completion detection instead of custom polling loop.

The code still polls messages in a while loop with custom completion logic. Use the getSessionStatus() wrapper (already exported from opencode-client.js) which calls session.status() to check for the status: 'completed' field. This removes the need for the manual polling loop, message tracking, stable poll counting, and other heuristics currently in place.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/headless.js` around lines 248 - 457, The poll loop manually inspects
messages via getMessages and complex heuristics (seenTextParts, stablePolls,
lastAssistantMsgId, output growth, etc. in the while loop) — replace this entire
custom polling logic with the existing getSessionStatus() wrapper from
opencode-client.js (which calls session.status() and returns status:'completed')
to detect completion; keep the external abort check and tool/message logging
that you still need, but remove the
stablePolls/seenTextParts/lastOutputLength/pollCount logic and the custom
completion checks, and instead call getSessionStatus(sessionId) on a short
interval, treat status==='completed' as completion, propagate sessionError if
status indicates failure, and continue to use existing logMessage/writeProgress
calls for received parts when still streaming messages. Ensure you update
references to getMessages, lastAssistantMsgId, stablePolls, seenTextParts, and
any related variables so they are removed or reconciled with getSessionStatus()
usage.

♻️ Duplicate comments (1)

src/mcp-server.js (1)

122-136: ⚠️ Potential issue | 🔴 Critical

Clean up partially-created shared sessions before spawn fallback.

If failure happens after createSession()/metadata write, current fallback can leave stale metadata and leaked shared-session state, then start a spawned task with inconsistent ownership metadata.

🧹 Proposed cleanup before falling through to spawn

-      let sessionId;
+      let sessionId;
+      let sharedClient;
       try {
-        const { server, client } = await sharedServer.ensureServer();
+        const { server, client } = await sharedServer.ensureServer();
+        sharedClient = client;
         const { createSession } = require('./opencode-client');
@@
       } catch (err) {
         logger.warn('Shared server path failed, falling back to spawn', { error: err.message });
         // Clean up partial shared server state before falling through
         if (sessionId) {
           sharedServer.removeSession(sessionId);
+          try {
+            const { abortSession } = require('./opencode-client');
+            await abortSession(sharedClient, sessionId);
+          } catch (cleanupErr) {
+            logger.warn('Failed to abort leaked shared session', { sessionId, error: cleanupErr.message });
+          }
+          try {
+            fs.rmSync(sessionDir, { recursive: true, force: true });
+          } catch (cleanupErr) {
+            logger.warn('Failed to remove partial shared session directory', {
+              sessionDir,
+              error: cleanupErr.message,
+            });
+          }
         }
         // Fall through to spawn path below
       }

Also applies to: 208-215

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/mcp-server.js` around lines 122 - 136, If an error occurs after
createSession(client) and after writing metadata.json (metaPath) we must clean
up the partially-created shared session to avoid stale metadata and leaked
state: update the error/fallback path to delete the metadata file (metaPath) and
remove the session directory (sessionDir) and/or call the session teardown
helper (e.g. deleteSession(sessionId) or
destroySession/sessionRegistry.deregister(sessionId) if one exists) before
falling through to spawn; ensure cleanup is best-effort (catch and log errors
but proceed with spawn) and apply the same cleanup logic to the other duplicate
block around the 208-215 region so both shared-session failure paths remove
metadata and deregister the sessionId.

🧹 Nitpick comments (3)

src/mcp-server.js (1)
74-248: Split sidecar_start() into smaller helpers.

This function now exceeds the repository limits for function length and try/catch density, which makes lifecycle/error paths harder to reason about and maintain.

As per coding guidelines: "Any function must not exceed 50 lines" and "Avoid more than 3 try/catch blocks in one function - split error handling".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mcp-server.js` around lines 74 - 248, The sidecar_start function is too
long and has too many try/catch blocks; split it into small helpers to reduce
length and error-handling density. Extract the shared-server path into a helper
like runSharedHeadlessFlow(resolvedModel, input, cwd, taskId, sessionDir) that
handles ensureServer(), createSession(), buildContext(), buildPrompts(),
sharedServer.addSession/removeSession, runHeadless(). Extract the per-process
fallback into spawnSidecarAndWriteMeta(args, sessionDir, taskId, input) which
calls spawnSidecarProcess(), writes metadata.json, and returns the child/meta;
move the metadata finalize/remove logic into finalizeSessionHandler(sessionDir,
metaPath, result) (used by the runHeadless promise handlers). Keep sidecar_start
as a thin coordinator that validates inputs (validateStartInputs), computes
args/taskId/cwd, then calls either runSharedHeadlessFlow or
spawnSidecarAndWriteMeta; ensure each new helper has at most one try/catch and
reduce total try/catch in sidecar_start to <=3.
src/utils/input-validators.js (1)
37-125: Break validateStartInputs() into stage helpers.

This new function is over the project limit and would be easier to maintain/test if split by stage (prompt/model/timeout/agent).

As per coding guidelines: "Any function must not exceed 50 lines".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/input-validators.js` around lines 37 - 125, validateStartInputs is
too large—split its four validation stages into small helpers: create
validatePromptStage(input) (use validatePromptContent),
validateModelStage(input) (use tryResolveModel, getEffectiveAliases,
findSimilar), validateTimeoutStage(input), and validateAgentStage(input) (use
validateHeadlessAgent); have each helper return the same { valid, error,
resolvedModel? } shape and make validateStartInputs simply call them in order,
returning early on errors and finally returning { valid: true, resolvedModel }
so resolved model from tryResolveModel is preserved; update imports/usages of
validatePromptContent, tryResolveModel, getEffectiveAliases, findSimilar, and
validateHeadlessAgent as needed and add unit tests for each new helper.
src/sidecar/resume.js (1)
145-153: Use shared drift/conflict utilities instead of local drift checks.

The resume drift path still relies on local checkFileDrift()/buildDriftWarning() logic. Please switch this segment to conflict.js/drift.js utilities so resume behavior stays consistent across sidecar flows.

As per coding guidelines: "Use conflict.js for file conflict detection by comparing mtimes against session start time" and "Use drift.js calculateDrift() for detecting stale context based on age and turn count".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sidecar/resume.js` around lines 145 - 153, Replace the local
checkFileDrift/buildDriftWarning flow with the shared utilities: import the
file-conflict checker from conflict.js (e.g., checkConflicts or similar) and the
calculateDrift function from drift.js, call checkConflicts(metadata,
sessionStart) to detect mtime-based conflicts and calculateDrift(metadata,
project) to compute stale-context info
(hasChanges/changedFiles/lastActivityTime), then use the drift.js-provided
warning builder (instead of buildDriftWarning) to compose the driftWarning and
set resumePrompt = systemPrompt + '\n' + driftWarning; keep the same logger.warn
call but include taskId and changedFileCount from the new conflict/drift
results.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/troubleshooting.md`:
- Line 7: Update the table cell that currently instructs to run rm on
session.lock to first recommend retrying/resuming once or waiting the staleness
window (e.g., a few minutes) because the lock cleanup logic already treats dead
PIDs/old locks as stale; if the retry/wait fails, then instruct manual deletion
of session.lock as a fallback. Locate the table row referencing "session.lock"
and replace the direct `rm` instruction with a short sequence: retry/wait
guidance, how long to wait, then manual deletion as last resort, keeping the
mention of "session.lock" so readers know which file to remove if needed.
- Line 5: Update the troubleshooting entry about the sidecar idle timeout to
mention the per-mode environment variables that override SIDECAR_IDLE_TIMEOUT:
document SIDECAR_IDLE_TIMEOUT_HEADLESS, SIDECAR_IDLE_TIMEOUT_INTERACTIVE, and
SIDECAR_IDLE_TIMEOUT_SERVER and note that any of these set to 0 will disable the
watchdog for that mode; also advise to check those per-mode vars in addition to
SIDECAR_IDLE_TIMEOUT and to use LOG_LEVEL=debug to trace watchdog transitions.

In `@scripts/eval-with-monitoring.sh`:
- Around line 32-33: The SUMMARY line written to MONITOR_LOG lacks the phase
label so later greps for SUMMARY.*PRE-EVAL/POST-EVAL fail and leave
PRE_OC/POST_OC/PRE_RSS/POST_RSS unset; update the echo that writes to
"$MONITOR_LOG" to include the same label (use "$label" like the console echo) so
the file contains "SUMMARY" plus the phase tag, and also make the variable
extraction robust (the code that sets PRE_OC, POST_OC, PRE_RSS, POST_RSS should
guard against grep failure by providing defaults or using "grep ... || true" or
parameter expansion defaults) to avoid exiting under set -euo pipefail.
- Line 60: The trap that kills the monitor process uses kill $MONITOR_PID which
can return non-zero and abort the script under set -e; modify the trap to make
the kill non-fatal by appending || true (so: kill $MONITOR_PID || true) and keep
the existing wait $MONITOR_PID || true; update the trap command that references
MONITOR_PID to include the || true after kill to ensure the post-eval analysis
runs even if the monitor already exited.

---

Outside diff comments:
In `@src/headless.js`:
- Around line 248-457: The poll loop manually inspects messages via getMessages
and complex heuristics (seenTextParts, stablePolls, lastAssistantMsgId, output
growth, etc. in the while loop) — replace this entire custom polling logic with
the existing getSessionStatus() wrapper from opencode-client.js (which calls
session.status() and returns status:'completed') to detect completion; keep the
external abort check and tool/message logging that you still need, but remove
the stablePolls/seenTextParts/lastOutputLength/pollCount logic and the custom
completion checks, and instead call getSessionStatus(sessionId) on a short
interval, treat status==='completed' as completion, propagate sessionError if
status indicates failure, and continue to use existing logMessage/writeProgress
calls for received parts when still streaming messages. Ensure you update
references to getMessages, lastAssistantMsgId, stablePolls, seenTextParts, and
any related variables so they are removed or reconciled with getSessionStatus()
usage.

In `@src/sidecar/resume.js`:
- Around line 162-175: The resumed headless flow may carry a metadata.agent of
"chat" causing runHeadless() to stall; update the agent selection so that when
headless is true you force the Build agent. Change the effectiveAgent logic used
before calling runHeadless (referencing effectiveAgent, metadata.agent,
headless, runHeadless, and buildResumeUserMessage) so that effectiveAgent is set
to 'Build' for headless resumes (otherwise keep metadata.agent || 'Build'), and
pass that forced value into runHeadless.

---

Duplicate comments:
In `@src/mcp-server.js`:
- Around line 122-136: If an error occurs after createSession(client) and after
writing metadata.json (metaPath) we must clean up the partially-created shared
session to avoid stale metadata and leaked state: update the error/fallback path
to delete the metadata file (metaPath) and remove the session directory
(sessionDir) and/or call the session teardown helper (e.g.
deleteSession(sessionId) or destroySession/sessionRegistry.deregister(sessionId)
if one exists) before falling through to spawn; ensure cleanup is best-effort
(catch and log errors but proceed with spawn) and apply the same cleanup logic
to the other duplicate block around the 208-215 region so both shared-session
failure paths remove metadata and deregister the sessionId.

---

Nitpick comments:
In `@src/mcp-server.js`:
- Around line 74-248: The sidecar_start function is too long and has too many
try/catch blocks; split it into small helpers to reduce length and
error-handling density. Extract the shared-server path into a helper like
runSharedHeadlessFlow(resolvedModel, input, cwd, taskId, sessionDir) that
handles ensureServer(), createSession(), buildContext(), buildPrompts(),
sharedServer.addSession/removeSession, runHeadless(). Extract the per-process
fallback into spawnSidecarAndWriteMeta(args, sessionDir, taskId, input) which
calls spawnSidecarProcess(), writes metadata.json, and returns the child/meta;
move the metadata finalize/remove logic into finalizeSessionHandler(sessionDir,
metaPath, result) (used by the runHeadless promise handlers). Keep sidecar_start
as a thin coordinator that validates inputs (validateStartInputs), computes
args/taskId/cwd, then calls either runSharedHeadlessFlow or
spawnSidecarAndWriteMeta; ensure each new helper has at most one try/catch and
reduce total try/catch in sidecar_start to <=3.

In `@src/sidecar/resume.js`:
- Around line 145-153: Replace the local checkFileDrift/buildDriftWarning flow
with the shared utilities: import the file-conflict checker from conflict.js
(e.g., checkConflicts or similar) and the calculateDrift function from drift.js,
call checkConflicts(metadata, sessionStart) to detect mtime-based conflicts and
calculateDrift(metadata, project) to compute stale-context info
(hasChanges/changedFiles/lastActivityTime), then use the drift.js-provided
warning builder (instead of buildDriftWarning) to compose the driftWarning and
set resumePrompt = systemPrompt + '\n' + driftWarning; keep the same logger.warn
call but include taskId and changedFileCount from the new conflict/drift
results.

In `@src/utils/input-validators.js`:
- Around line 37-125: validateStartInputs is too large—split its four validation
stages into small helpers: create validatePromptStage(input) (use
validatePromptContent), validateModelStage(input) (use tryResolveModel,
getEffectiveAliases, findSimilar), validateTimeoutStage(input), and
validateAgentStage(input) (use validateHeadlessAgent); have each helper return
the same { valid, error, resolvedModel? } shape and make validateStartInputs
simply call them in order, returning early on errors and finally returning {
valid: true, resolvedModel } so resolved model from tryResolveModel is
preserved; update imports/usages of validatePromptContent, tryResolveModel,
getEffectiveAliases, findSimilar, and validateHeadlessAgent as needed and add
unit tests for each new helper.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96bbd82b-0200-46cc-a55e-2a4a305fbea4

📥 Commits

Reviewing files that changed from the base of the PR and between 7ccf6fd and 4fd1417.

📒 Files selected for processing (8)

docs/troubleshooting.md
scripts/eval-with-monitoring.sh
src/headless.js
src/mcp-server.js
src/sidecar/resume.js
src/sidecar/session-utils.js
src/utils/input-validators.js
tests/shared-server-e2e.integration.test.js

🚧 Files skipped from review as they are similar to previous changes (2)

src/sidecar/session-utils.js
tests/shared-server-e2e.integration.test.js

docs/troubleshooting.md

coderabbitai · 2026-03-14T19:54:04Z

scripts/eval-with-monitoring.sh

+  echo "SUMMARY: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG"
+  echo "[$label] opencode=$oc_count sidecar=$sc_count total_rss=${total_rss}MB"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify mismatch between written SUMMARY format and parser expectations. sed -n '32p;87,90p' scripts/eval-with-monitoring.sh

Repository: jrenaldi79/sidecar

Length of output: 584

🏁 Script executed:

#!/bin/bash # Get more context around the problematic lines # Check script start for set -euo pipefail head -5 scripts/eval-with-monitoring.sh # Check context around line 32 (within a function) sed -n '20,40p' scripts/eval-with-monitoring.sh # Check context around lines 87-90 sed -n '80,95p' scripts/eval-with-monitoring.sh # Check if there are other SUMMARY lines grep -n "SUMMARY" scripts/eval-with-monitoring.sh

Repository: jrenaldi79/sidecar

Length of output: 3060

🏁 Script executed:

#!/bin/bash # Check for set -euo pipefail and error handling grep -n "set -[a-z]*" scripts/eval-with-monitoring.sh # Get line count to understand script size wc -l scripts/eval-with-monitoring.sh # Check what happens after lines 87-90 - are there error handlers? sed -n '87,105p' scripts/eval-with-monitoring.sh

Repository: jrenaldi79/sidecar

Length of output: 1016

SUMMARY format mismatch breaks leak detection—script will abort under set -euo pipefail.

Line 32 writes SUMMARY: without the phase label, but lines 87–90 search for SUMMARY.*PRE-EVAL / POST-EVAL. When grep finds no match, the pipe fails and variable assignment leaves PRE_OC, POST_OC, PRE_RSS, POST_RSS unset, causing the script to exit before leak reporting runs.

Suggested fix

- echo "SUMMARY: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG" + echo "SUMMARY [$label]: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG" echo "[$label] opencode=$oc_count sidecar=$sc_count total_rss=${total_rss}MB" @@ -PRE_OC=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2) -POST_OC=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2) -PRE_RSS=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2) -POST_RSS=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2) +PRE_OC=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || true) +POST_OC=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || true) +PRE_RSS=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || true) +POST_RSS=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || true) +PRE_OC=${PRE_OC:-0} +POST_OC=${POST_OC:-0} +PRE_RSS=${PRE_RSS:-0} +POST_RSS=${POST_RSS:-0}

Also applies to: 87-90

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/eval-with-monitoring.sh` around lines 32 - 33, The SUMMARY line written to MONITOR_LOG lacks the phase label so later greps for SUMMARY.*PRE-EVAL/POST-EVAL fail and leave PRE_OC/POST_OC/PRE_RSS/POST_RSS unset; update the echo that writes to "$MONITOR_LOG" to include the same label (use "$label" like the console echo) so the file contains "SUMMARY" plus the phase tag, and also make the variable extraction robust (the code that sets PRE_OC, POST_OC, PRE_RSS, POST_RSS should guard against grep failure by providing defaults or using "grep ... || true" or parameter expansion defaults) to avoid exiting under set -euo pipefail.

scripts/eval-with-monitoring.sh

- troubleshooting.md: document per-mode timeout overrides, clarify lock auto-recovery behavior - eval-with-monitoring.sh: default empty grep results to 0, guard kill of already-exited monitor process Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

scripts/eval-with-monitoring.sh (1)

32-33: ⚠️ Potential issue | 🔴 Critical

Fix SUMMARY phase tagging; leak parsing is currently blind.

Line 32 writes SUMMARY: without phase info, but Lines 87–90 and Line 114 grep SUMMARY lines for PRE-EVAL/POST-EVAL/DURING. This forces fallback zeros and can hide real leaks/peaks.

Proposed fix

-  echo "SUMMARY: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG"
+  echo "SUMMARY [$label]: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG"
@@
-PRE_OC=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0")
-POST_OC=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0")
-PRE_RSS=$(grep "SUMMARY.*PRE-EVAL" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0")
-POST_RSS=$(grep "SUMMARY.*POST-EVAL" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0")
+PRE_OC=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0")
+POST_OC=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "opencode=[0-9]*" | cut -d= -f2 || echo "0")
+PRE_RSS=$(grep "SUMMARY \[PRE-EVAL\]:" "$MONITOR_LOG" | head -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0")
+POST_RSS=$(grep "SUMMARY \[POST-EVAL\]:" "$MONITOR_LOG" | tail -1 | grep -o "total_rss_mb=[0-9.]*" | cut -d= -f2 || echo "0")
@@
-grep "SUMMARY.*DURING" "$MONITOR_LOG" | sort -t= -k4 -n -r | head -3
+grep "SUMMARY \[DURING" "$MONITOR_LOG" | sort -t= -k4 -n -r | head -3

Also applies to: 87-90, 114-114

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/eval-with-monitoring.sh` around lines 32 - 33, The SUMMARY log
written to "$MONITOR_LOG" is missing the phase tag so downstream greps for
PRE-EVAL/POST-EVAL/DURING miss entries; update the SUMMARY write that uses
MONITOR_LOG to include the phase label (the same $label used in the console
echo) so lines in the file contain the phase (e.g. include [$label] or prepend
the phase token to the "SUMMARY:" entry), ensuring the parsers that grep for
PRE-EVAL/POST-EVAL/DURING will match; adjust the SUMMARY echo that references
oc_count, sc_count and total_rss to include $label.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/troubleshooting.md`:
- Line 5: Update the sentence to correctly state that per-mode environment
variables (SIDECAR_IDLE_TIMEOUT_HEADLESS, SIDECAR_IDLE_TIMEOUT_INTERACTIVE,
SIDECAR_IDLE_TIMEOUT_SERVER) take precedence over the blanket
SIDECAR_IDLE_TIMEOUT, and remove or reword the assertion that setting
LOG_LEVEL=debug specifically emits watchdog state-transition traces—instead say
that enabling debug increases log verbosity which may help when investigating
watchdog behavior. Keep the checklist format and retain the guidance to check
per-mode overrides first, mention the blanket SIDECAR_IDLE_TIMEOUT as a
fallback, and adjust the LOG_LEVEL claim to a general note about increased
verbosity rather than guaranteed state-transition traces.

---

Duplicate comments:
In `@scripts/eval-with-monitoring.sh`:
- Around line 32-33: The SUMMARY log written to "$MONITOR_LOG" is missing the
phase tag so downstream greps for PRE-EVAL/POST-EVAL/DURING miss entries; update
the SUMMARY write that uses MONITOR_LOG to include the phase label (the same
$label used in the console echo) so lines in the file contain the phase (e.g.
include [$label] or prepend the phase token to the "SUMMARY:" entry), ensuring
the parsers that grep for PRE-EVAL/POST-EVAL/DURING will match; adjust the
SUMMARY echo that references oc_count, sc_count and total_rss to include $label.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5f0b2ba1-fd48-4e38-9550-ce139f8953e0

📥 Commits

Reviewing files that changed from the base of the PR and between 4fd1417 and 10ddca8.

📒 Files selected for processing (2)

docs/troubleshooting.md
scripts/eval-with-monitoring.sh

coderabbitai · 2026-03-14T20:04:51Z

docs/troubleshooting.md


 | Issue | Cause | Solution |
 |-------|-------|----------|
+| Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER`. The blanket `SIDECAR_IDLE_TIMEOUT` overrides all modes. Set `LOG_LEVEL=debug` to trace watchdog transitions |


⚠️ Potential issue | 🟡 Minor

Fix timeout precedence wording and watchdog tracing claim (Line 5).

The precedence is reversed: per-mode vars take priority over SIDECAR_IDLE_TIMEOUT, not the other way around. Also, LOG_LEVEL=debug does not specifically emit watchdog state-transition traces as written.

Proposed doc correction

-| Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER`. The blanket `SIDECAR_IDLE_TIMEOUT` overrides all modes. Set `LOG_LEVEL=debug` to trace watchdog transitions | +| Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER` (these take precedence). `SIDECAR_IDLE_TIMEOUT` is the fallback blanket setting. Set `LOG_LEVEL=debug` to increase diagnostics, and check idle-timeout/stuck-stream watchdog logs |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER`. The blanket `SIDECAR_IDLE_TIMEOUT` overrides all modes. Set `LOG_LEVEL=debug` to trace watchdog transitions |

| Sidecar process not self-terminating | Idle timeout misconfigured or disabled | Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER` (these take precedence). `SIDECAR_IDLE_TIMEOUT` is the fallback blanket setting. Set `LOG_LEVEL=debug` to increase diagnostics, and check idle-timeout/stuck-stream watchdog logs |

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/troubleshooting.md` at line 5, Update the sentence to correctly state that per-mode environment variables (SIDECAR_IDLE_TIMEOUT_HEADLESS, SIDECAR_IDLE_TIMEOUT_INTERACTIVE, SIDECAR_IDLE_TIMEOUT_SERVER) take precedence over the blanket SIDECAR_IDLE_TIMEOUT, and remove or reword the assertion that setting LOG_LEVEL=debug specifically emits watchdog state-transition traces—instead say that enabling debug increases log verbosity which may help when investigating watchdog behavior. Keep the checklist format and retain the guidance to check per-mode overrides first, mention the blanket SIDECAR_IDLE_TIMEOUT as a fallback, and adjust the LOG_LEVEL claim to a general note about increased verbosity rather than guaranteed state-transition traces.

jrenaldi79 and others added 28 commits March 14, 2026 10:25

feat: crash handler deletes session lock file on uncaught exception

706ab80

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: update start.test.js to expect pid in metadata

d83b974

Task 5 added pid: process.pid to createSessionMetadata(). The existing test expected pid to be undefined; now it verifies the correct value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: integrate IdleWatchdog into interactive mode (60-min timeout)

e84a093

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: preserve existing pid from MCP handler in createSessionMetadata

7fbd629

When MCP handler pre-writes pid to metadata, createSessionMetadata should preserve it instead of overwriting with process.pid. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: integrate IdleWatchdog with opencode-client sendPrompt

80afc25

Wraps sendPrompt() with watchdog.markBusy()/markIdle() in try/finally. Watchdog is optional - existing callers unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: integrate IdleWatchdog into headless mode (15-min timeout)

ca9ae1c

Creates watchdog on server start, touch() on poll loop, cancel on cleanup. Hoisted watchdog declaration to function scope for catch block access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add atomic session lock file mechanism

edae761

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update CLAUDE.md with process lifecycle modules and env vars

e2f94e5

docs: add process lifecycle documentation to user-facing docs

b111b0a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add shared server e2e integration test with memory monitoring

25fc8b3

Spawns real MCP server, fires 3 concurrent Gemini sessions, monitors RSS, verifies single shared server process, checks cleanup on exit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add MCP polling integration design spec

44da49f

Reuse runHeadless() from shared server MCP path by adding guard points for external client/server/watchdog. Fire-and-forget async pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: address reviewer feedback on MCP polling spec

4699bb6

Fix 5 issues: waitForServer skip, watchdog process.exit guard, third server.close location, session ownership, metadata ownership. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add MCP polling integration implementation plan

22e2da3

4 tasks: guard points in runHeadless, MCP handler rewrite, test suite, e2e verification. All 5 server.close() locations guarded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: MCP shared server path delegates to runHeadless with context an…

6f80753

…d finalization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add eval monitoring script for process/memory tracking

26098b3

Wraps eval runner with PID/RSS snapshots before, during (every 10s), and after eval execution. Detects orphaned processes and reports delta. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: increase eval runner timeout from 5 min to 10 min

efc3159

Sidecar evals need ~7 min for full round-trip: Claude initialization, sidecar_start, LLM processing, polling, read, and applying fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: add MCP error response format and descriptive message tests

b4ea478

Verify error objects are JSON-serializable, include available aliases, and contain the invalid input in the message for LLM debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

jrenaldi79 and others added 3 commits March 14, 2026 14:14

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

jrenaldi79 merged commit 19790f3 into main Mar 14, 2026
3 checks passed

jrenaldi79 deleted the feature/memory-leak branch March 14, 2026 20:41

		echo "SUMMARY: opencode=$oc_count sidecar=$sc_count total_rss_mb=${total_rss}" >> "$MONITOR_LOG"
		echo "[$label] opencode=$oc_count sidecar=$sc_count total_rss=${total_rss}MB"

	\| Sidecar process not self-terminating \| Idle timeout misconfigured or disabled \| Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER`. The blanket `SIDECAR_IDLE_TIMEOUT` overrides all modes. Set `LOG_LEVEL=debug` to trace watchdog transitions \|
	\| Sidecar process not self-terminating \| Idle timeout misconfigured or disabled \| Check per-mode overrides first: `SIDECAR_IDLE_TIMEOUT_HEADLESS`, `SIDECAR_IDLE_TIMEOUT_INTERACTIVE`, `SIDECAR_IDLE_TIMEOUT_SERVER` (these take precedence). `SIDECAR_IDLE_TIMEOUT` is the fallback blanket setting. Set `LOG_LEVEL=debug` to increase diagnostics, and check idle-timeout/stuck-stream watchdog logs \|

Conversation

jrenaldi79 commented Mar 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jrenaldi79 commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading