fix(pm): resolve PM interview timeouts and crashes across all backends#231
fix(pm): resolve PM interview timeouts and crashes across all backends#231
Conversation
- Guard litellm ImportError in pm.py and factory.py with helpful install message - Convert unbounded ask_next_question() recursion to bounded loop (max 8 auto-advances) - Set max_turns=5 and timeout=90s for PM interview LLM adapter (was max_turns=1) - Add application-level timeout to ClaudeCodeAdapter - Fix Codex adapter: allowed_tools=[] now emits "no tools" constraint (was silent) - Reject empty initial_context with specific error message - Restore PM meta before load_state to prevent silent deferred tracking loss - Remove dead meta.ask_user_question branch from SKILL.md - Add cross-platform clipboard fallback (pbcopy/wl-copy/xclip) E2E verified: claude_code backend ~67s, codex backend ~68s (both under 90s timeout) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
c34681d| Triggered by: automatic review
Branch: fix/pm-interview-timeouts | 12 files, +405/-112 | CI: MyPy Type Check pass 42s https://github.com/Q00/ouroboros/actions/runs/23636206279/job/68845903845 | Ruff Lint pass 15s https://github.com/Q00/ouroboros/actions/runs/23636206279/job/68845903857 | Test Python 3.12 pass 1m2s https://github.com/Q00/ouroboros/actions/runs/23636206271/job/68845903830 | Test Python 3.13 pass 1m5s https://github.com/Q00/ouroboros/actions/runs/23636206271/job/68845903814 | Test Python 3.14 pass 1m16s https://github.com/Q00/ouroboros/actions/runs/23636206271/job/6884
Issue Requirements
No linked issue detected in PR body.
Previous Review Follow-up
First review — no previous findings.
Blocking Findings
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/pm_handler.py:323 | High | High | The new PM-interview adapter settings are still bypassed in the normal MCP server path. _get_engine() prefers self.llm_adapter over the freshly configured adapter, and the server wires PMInterviewHandler with a shared llm_adapter, so the advertised max_turns=5, timeout=90, allowed_tools=[], and per-request cwd never apply in production. That means the timeout/empty-response fixes this PR is trying to land are not actually used for the default handler construction. |
| 2 | src/ouroboros/providers/codex_cli_adapter.py:84 | High | High | The new “no tools” prompt is applied far more broadly than intended because allowed_tools=None is collapsed to [] at construction time. After that, _build_prompt() cannot distinguish “caller omitted tool constraints” from “caller explicitly forbade all tools”, so default Codex adapters now emit “Do NOT use any tools or MCP calls” even for existing call sites that intentionally pass allowed_tools=None. This is a regression for non-PM interview flows and any default Codex-backed completion path. |
| 3 | src/ouroboros/providers/factory.py:94 | Medium | High | The factory now accepts cwd from callers like PMInterviewHandler, but the claude_code branch still drops it entirely. Even if finding #1 is fixed, Claude-backed PM interviews will continue to run in the process working directory instead of the request’s cwd, so repo-relative behavior remains inconsistent across backends. |
| 4 | src/ouroboros/mcp/tools/pm_handler.py:780 | Medium | Medium | _handle_answer() now restores PM metadata before verifying that the interview state can be loaded. On a bad or stale session_id, that mutates the long-lived engine with deferred items / pending reframe state and then returns an error, leaving contaminated in-memory state for the next request if there is no replacement pm_meta. Previously the engine stayed untouched on load failure. |
| 5 | skills/pm/SKILL.md:90 | Low | High | Step 5 now correctly allows “no clipboard tool found; skipping clipboard copy”, but Step 6 still instructs the agent to tell the user “(Clipboard에 복사되었습니다)” unconditionally. That leaves the skill internally contradictory and will cause false success messaging on Linux/CI environments without a clipboard utility. |
Test Coverage
Missing coverage around the production PM handler wiring: tests/unit/mcp/tools/test_pm_handler.py only exercises _get_engine() when llm_adapter is not injected, so it never catches that src/ouroboros/mcp/tools/pm_handler.py:323 ignores the new adapter kwargs under the actual server composition path.
Missing Codex regression coverage for the default allowed_tools=None case: tests/unit/providers/test_codex_cli_adapter.py only asserts the explicit-tools prompt path and does not verify the prompt built by a default adapter, which is why the new blanket “Do NOT use any tools” behavior slipped through.
Missing Claude cwd propagation coverage: tests/unit/providers/test_factory.py and tests/unit/providers/test_claude_code_adapter.py do not assert that a cwd passed into create_llm_adapter(..., backend="claude_code") reaches the SDK options, so the backend inconsistency remains untested.
Design Notes
The direction is good: the PR is trying to make PM interviews backend-aware and bounded instead of relying on unbounded recursion and transport timeouts. The main architectural problem is that adapter policy now lives in three places at once (PMInterviewHandler, create_llm_adapter(), and each concrete adapter), so the effective runtime behavior depends on which composition path instantiated the handler. That drift is what caused the production-path bypass and the backend inconsistency here.
Follow-up Recommendations
- Re-run a maintainer spot check on the exact HEAD commit if additional pushes land after this review.
- Confirm any non-blocking cleanup separately so blocker scope stays tied to merge risk.
Files Reviewed
- skills/pm/SKILL.md
- src/ouroboros/bigbang/pm_interview.py
- src/ouroboros/cli/commands/pm.py
- src/ouroboros/mcp/tools/pm_handler.py
- src/ouroboros/providers/claude_code_adapter.py
- src/ouroboros/providers/codex_cli_adapter.py
- src/ouroboros/providers/factory.py
- tests/unit/bigbang/test_decide_later_items.py
- tests/unit/cli/test_pm_overwrite.py
- tests/unit/mcp/tools/test_pm_handler.py
- tests/unit/providers/test_claude_code_adapter.py
- tests/unit/providers/test_factory.py
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
't5c15b98' | Triggered by: new push
Branch:
'tfix/pm-interview-timeouts' | 13 files, +543/-115 | CI: no checks reported
Issue Requirements
No linked issue detected in PR body.
Previous Review Follow-up
| Previous Finding | Status |
|---|---|
| PM handler ignored the PM-specific adapter settings when the server injected a shared adapter. | RESOLVED — now rebuilds a dedicated adapter whenever is configured, and the new regression test covers that server-composed path (, ). |
| Codex default adapters incorrectly emitted a blanket "Do NOT use any tools" constraint when was omitted. | RESOLVED — the adapter now preserves vs and only emits the text-only constraint for an explicit empty list, with coverage for both cases (, ). |
| Claude-backed adapters dropped caller-provided . | RESOLVED — the factory now forwards , the adapter stores it, and SDK options use that configured directory (, , ). |
| Resume-on-load-failure restored PM metadata too early and could contaminate a long-lived engine. | RESOLVED — now loads interview state before restoring PM metadata, and the regression test asserts is not called on load failure (, ). |
| The PM skill promised clipboard copy even when no clipboard tool exists. | RESOLVED — Step 6 now makes the clipboard message conditional on Step 5 succeeding (). |
Blocking Findings
No blocking findings verified on the current PR HEAD.
Test Coverage
Coverage is materially better on this revision:
- The PM handler tests now exercise the dedicated-adapter rebuild path and cwd propagation ().
- The adapter regressions that drove the prior review are covered for both Claude and Codex defaults (, ).
- The bounded auto-advance guard and persistence-error propagation are covered in the PM interview tests ().
Design Notes
This revision closes the main correctness gaps from the prior review without broadening the surface area much. The strongest part of the fix is that the runtime behavior and the tests now line up: PM interview requests get a dedicated backend adapter with explicit timeout/tool policy, the unbounded auto-advance recursion is replaced with a bounded loop that preserves state updates, and the resume path no longer mutates long-lived engine state before load succeeds.
Follow-up Recommendations
- GitHub currently reports the PR as , so the author should rebase or resolve branch conflicts before merge even though the reviewed diff itself looks ready.
- A targeted end-to-end check on the exact merge commit would still be worthwhile once the branch is conflict-free, since there are no GitHub checks reported on the current head.
Files Reviewed
- skills/pm/SKILL.md
- src/ouroboros/bigbang/pm_interview.py
- src/ouroboros/cli/commands/pm.py
- src/ouroboros/mcp/tools/pm_handler.py
- src/ouroboros/providers/claude_code_adapter.py
- src/ouroboros/providers/codex_cli_adapter.py
- src/ouroboros/providers/factory.py
- tests/unit/bigbang/test_decide_later_items.py
- tests/unit/cli/test_pm_overwrite.py
- tests/unit/mcp/tools/test_pm_handler.py
- tests/unit/providers/test_claude_code_adapter.py
- tests/unit/providers/test_codex_cli_adapter.py
- tests/unit/providers/test_factory.py
Reviewed by ouroboros-agent[bot] via manual deep analysis fallback after Codex timeout
Summary
pm.pyandfactory.py— graceful error message instead of crashask_next_question()— max 8 auto-advances instead of unbounded recursion (C1 fix)max_turns=1causing empty responses (C2 fix)ClaudeCodeAdapter(C3 fix)allowed_tools=[]fix — empty list now emits "Do NOT use any tools" constraint, preventing Codex from making recursive MCP tool callsinitial_contextrejection with specific error message (M4 fix)load_stateto prevent silent deferred tracking loss (M3 fix)meta.ask_user_questionbranch (M1), added cross-platform clipboard fallback (M2)E2E Verification
Checks passed
Test plan
claude_codebackend → interview start + first questioncodexbackend → interview start + first question🤖 Generated with Claude Code