Skip to content

feat(pm): return DECIDE_LATER questions to user instead of auto-skipping#238

Open
cohemm wants to merge 10 commits intomainfrom
feat/pm-decide-later-user-choice
Open

feat(pm): return DECIDE_LATER questions to user instead of auto-skipping#238
cohemm wants to merge 10 commits intomainfrom
feat/pm-decide-later-user-choice

Conversation

@cohemm
Copy link
Copy Markdown
Collaborator

@cohemm cohemm commented Mar 27, 2026

Summary

  • DECIDE_LATER 질문을 MCP 내부에서 자동 응답/재귀하지 않고 main session에 반환하여 유저가 직접 "decide later" 옵션을 선택할 수 있도록 변경
  • 무한 재귀로 인한 MCP 120s 타임아웃 문제 해결
  • DEFERRED 분류는 기존 자동 스킵 동작 유지

Changes

File Description
pm_interview.py DECIDE_LATER 자동 응답/재귀 제거, skip_as_decide_later() 메서드 추가
question_classifier.py DECIDE_LATER의 question_for_pm이 질문 텍스트를 반환하도록 변경
pm_handler.py decide_later classification 시 decide_later_candidate 메타데이터 추가
cli/commands/pm.py CLI에서 decide later 옵션 처리
tests (4 files) 새로운 동작에 맞게 테스트 업데이트

Test plan

  • PM interview 테스트 통과 (272 passed)
  • DECIDE_LATER 질문이 유저에게 반환되는지 수동 확인
  • 유저가 "decide later" 선택 시 PRD 미결정 사항에 기록되는지 확인
  • DEFERRED 질문은 기존대로 자동 스킵되는지 확인

🤖 Generated with Claude Code

cohemm and others added 2 commits March 27, 2026 23:38
Remove auto-answer + recursion for DECIDE_LATER classified questions in
PMInterviewEngine.ask_next_question(). Questions are now returned to the
main session with classification metadata, allowing users to explicitly
choose "decide later" via the UI. This eliminates infinite recursion that
caused MCP 120s timeouts when consecutive technical questions were generated.

DEFERRED questions remain auto-skipped (unchanged behavior).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ing (#C1)

Remove auto-answer + recursion for DECIDE_LATER classified questions in
PMInterviewEngine.ask_next_question(). Questions are now returned to the
main session with classification metadata, allowing users to explicitly
choose "decide later" via the UI. This eliminates infinite recursion that
caused MCP 120s timeouts when consecutive technical questions were generated.

DEFERRED questions remain auto-skipped (unchanged behavior).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 894f2e3 | Triggered by: PR opened

Branch: feat/pm-decide-later-user-choice | 8 files, +391/-168 | CI: Ruff format failing; mypy/tests passing

What improved

  • Good direction overall: returning DECIDE_LATER questions to the caller instead of auto-recursing does address the timeout class described in the PR summary.
  • The engine/handler split is clearer now, and the new skip_as_decide_later() path is a sensible abstraction.
  • The added unit coverage around DECIDE_LATER classification and PM handler state restoration is substantial.

Issue #N Requirements

Requirement Status
Return DECIDE_LATER questions to the user instead of auto-skipping METask_next_question() now returns the original question on DECIDE_LATER instead of recursing (src/ouroboros/bigbang/pm_interview.py:510)
Let the user explicitly choose “decide later” and record it in PM output NOT MET — the public hints say to send decide_later, but the handler only recognizes [decide_later], so the advertised path is not actually wired (src/ouroboros/mcp/tools/pm_handler.py:535, src/ouroboros/mcp/tools/pm_handler.py:862)
CLI should support the advertised “decide later” / “skip” input NOT MET — the CLI prints that those commands defer the question, but it forwards the raw input through normal record_response() with no special handling (src/ouroboros/cli/commands/pm.py:416, src/ouroboros/cli/commands/pm.py:431, src/ouroboros/cli/commands/pm.py:456)

Previous Review Follow-up

Previous Finding Status
First review — no previous findings.

Blocking Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/cli/commands/pm.py:416 High High The CLI now tells users to type "decide later" or "skip", but there is no matching branch before record_response() is called. In practice those strings will be stored as the user’s answer instead of invoking skip_as_decide_later(), so the main user-facing flow introduced by this PR does not work in the CLI.
2 src/ouroboros/mcp/tools/pm_handler.py:535 High High The MCP response text tells clients to send answer="decide_later", but _handle_answer() only treats the bracketed sentinel "[decide_later]" as the decide-later action (src/ouroboros/mcp/tools/pm_handler.py:862). A client that follows the documented/public contract will silently take the wrong path and record decide_later as a normal answer.
3 src/ouroboros/mcp/tools/pm_handler.py:1133 Low High The PM-generation summary labels the count as Decide-later items, but the value is len(seed.deferred_items) + len(seed.decide_later_items). That over-reports decide-later items whenever deferred technical questions also exist, so the generated status text is misleading.
4 src/ouroboros/bigbang/question_classifier.py:55 Low High The ClassifierOutputType and output_type docstrings still describe DECIDE_LATER as “auto-answered with a placeholder”, which is no longer true after this PR. Keeping the old contract in the inline docs makes the new public behavior harder to implement correctly and already drifted from the handler/CLI text.
5 src/ouroboros/cli/commands/pm.py:333 Low High This PR leaves the branch red on Ruff format (gh pr checks reports ruff format --check failing for src/ouroboros/cli/commands/pm.py, src/ouroboros/mcp/tools/pm_handler.py, and updated tests). Since the repo gates on clean CI, this needs to be fixed before merge.

Test Coverage

  • Good unit coverage for the internal sentinel path (tests/unit/mcp/tools/test_pm_handler.py:457 onward), but I did not find coverage for the publicly advertised MCP input answer="decide_later" or the CLI strings "decide later" / "skip".
  • Please add one end-to-end-ish handler test that uses the exact hint text clients will follow, plus one CLI loop test that proves the defer keywords call skip_as_decide_later() instead of being recorded verbatim.

Design Notes

The architectural direction is sound: pushing the decision back to the caller avoids the recursive timeout pattern and keeps DECIDE_LATER distinct from DEFERRED. The main issue is contract drift across layers. The engine, CLI text, MCP text, and handler sentinel currently disagree on the exact defer token, so the behavior is correct only if the caller knows the private internal marker. Once those interfaces are aligned, this change should be much easier to trust.

Follow-up Recommendations

  • Normalize all decide-later inputs through a single helper/shared constant so the CLI, MCP handler, and tests cannot drift again.
  • Consider exposing an explicit action/enum in MCP metadata instead of a magic answer string.
  • A small integration test that runs ask_next_question() → user chooses defer → generate_pm_seed() would lock the new contract down well.

Files Reviewed

  • src/ouroboros/bigbang/pm_interview.py
  • src/ouroboros/bigbang/question_classifier.py
  • src/ouroboros/cli/commands/pm.py
  • src/ouroboros/mcp/tools/pm_handler.py
  • tests/unit/bigbang/test_decide_later_items.py
  • tests/unit/bigbang/test_pm_interview.py
  • tests/unit/bigbang/test_question_classifier.py
  • tests/unit/mcp/tools/test_pm_handler.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

cohemm and others added 3 commits March 28, 2026 00:23
…ping

Apply the same pattern as DECIDE_LATER to DEFERRED questions: return them
to the main session so the user can choose to defer or answer. Adds
skip_as_deferred() method and [deferred] answer handling in pm_handler.

This eliminates ALL recursion in ask_next_question(), fully resolving
the MCP 120s timeout issue. Both DEFERRED and DECIDE_LATER are now
user-driven, with skip_eligible metadata signaling the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…erage

- CLI now intercepts "decide later"/"skip"/"defer" input and calls
  skip_as_decide_later()/skip_as_deferred() instead of recording verbatim
- CLI shows defer hint for DEFERRED questions (not just DECIDE_LATER)
- question_for_pm returns original_question for DEFERRED (was empty string)
- MCP hint text uses bracketed format [decide_later]/[deferred] consistently
- Added MCP handler test for answer="[deferred]" path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uff format

- Fix PM generation summary: separate deferred/decide-later counts
  instead of combining them into one misleading number
- Update ClassifierOutputType docstring: DEFERRED and DECIDE_LATER
  are now returned to user with skip option, not auto-answered
- Run ruff format on all changed files to fix CI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cohemm
Copy link
Copy Markdown
Collaborator Author

cohemm commented Mar 27, 2026

리뷰 피드백 반영했습니다.

변경사항:

추가로 DEFERRED 질문도 유저에게 반환하도록 변경하여 ask_next_question()의 재귀를 완전히 제거했습니다 (0426caa).

확인 부탁드립니다!

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit f08669f | Triggered by: new push by @cohemm

Branch: feat/pm-decide-later-user-choice | 8 files, +589/-306 | CI: MyPy Type Check pass 45s https://github.com/Q00/ouroboros/actions/runs/23654179495/job/68906795773 | Ruff Lint pass 13s https://github.com/Q00/ouroboros/actions/runs/23654179495/job/68906795700 | Test Python 3.12 pass 1m5s https://github.com/Q00/ouroboros/actions/runs/23654179485/job/68906795834 | Test Python 3.13 pass 1m6s https://github.com/Q00/ouroboros/actions/runs/23654179485/job/68906795908 | Test Python 3.14 pass 1m14s https://github.com/Q00/ouroboros/actions/runs/23654179485/job/68906795813

Issue #N/A Requirements

| Requirement | Status |\n|---|---|\n| No linked issue detected in PR body | N/A |

Previous Review Follow-up

| Previous Finding | Status |\n|---|---|\n| ## Review — ouroboros-agent[bot] Verdict: REQUEST_CHANGES > Reviewing commit 894f2e3 | Triggered by: PR opened | Re-checked against current head in findings below |

Blocking Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/mcp/tools/pm_handler.py:879 High High The new MCP skip sentinels are accepted unconditionally. _handle_answer() will call skip_as_decide_later() / skip_as_deferred() for any pending question whenever the client sends "[decide_later]" or "[deferred]", without checking the saved classification or skip_eligible. That lets a client silently convert ordinary planning/reframed questions into deferred items, which is a new correctness break introduced by this PR’s public API.
2 src/ouroboros/mcp/tools/pm_handler.py:977 Medium High The per-round diff metadata is now computed after the skip mutation has already happened. When a user defers the current question, deferred_items / decide_later_items are incremented before deferred_before / decide_later_before are snapshotted, so new_deferred, new_decide_later, deferred_this_round, and decide_later_this_round stay empty for the very action this PR adds. That breaks the AC 8/back-compat metadata contract for callers that rely on per-turn deltas.
3 src/ouroboros/mcp/tools/pm_handler.py:764 Medium Medium The idempotent select_repos replay path still returns only the bare question plus status/idempotent. After this PR, skippable questions require classification, skip_eligible, and the skip hint text so the caller knows [deferred] / [decide_later] is valid. A retry of step 2 can therefore strand the user on a skippable first question with no way to discover the new contract.
4 src/ouroboros/bigbang/question_classifier.py:1 Style High The top-level classifier docs still say decide-later questions “get an automatic placeholder response without PM interaction”, which is no longer true. Since this module defines the classification contract used by the CLI/MCP layers, leaving the old behavior documented here is an easy way to reintroduce the already-fixed interface drift.
5 src/ouroboros/bigbang/question_classifier.py:64 Style High ClassificationResult and QuestionClassifier docstrings are still partly written for the old auto-skip flow (decide_later “auto-answered with a placeholder”, output_type docs saying DEFERRED is “skipped entirely”, etc.). The implementation and tests now treat both as user-visible questions, so these inline API docs are still inconsistent with the shipped behavior.
6 tests/unit/mcp/tools/test_pm_handler.py:456 Low High The new handler tests only cover the happy-path sentinel inputs. There is still no regression test proving that "[decide_later]" / "[deferred]" are rejected for non-skippable questions, and no assertion that the returned new_* / *_this_round metadata reflects a just-recorded skip. Those gaps are exactly where the current MCP regressions sit.

Test Coverage

Unable to execute pytest in this environment because pytest is not installed.

Coverage gaps I would still want before merge:

  • tests/unit/mcp/tools/test_pm_handler.py: add a negative-path test where the pending question is passthrough/reframed and answer="[decide_later]" or "[deferred]" is rejected instead of being recorded.
  • tests/unit/mcp/tools/test_pm_handler.py: add a test asserting that a skip action populates new_deferred / new_decide_later and deferred_this_round / decide_later_this_round in the same MCP response.
  • tests/unit/mcp/tools/test_pm_handler.py:2086: extend the idempotent select_repos replay test so the first pending question is skippable and the replay response includes classification, skip_eligible, and the skip hint text.

Design Notes

The previous blockers called out in the first review look fixed: CLI interception is wired, the MCP hint text now matches the bracketed sentinel, the PM summary count is split correctly, the classifier docs were partially updated, and formatting noise is gone. Returning DEFERRED / DECIDE_LATER questions to the caller instead of recursively auto-answering them is also a sound direction and should remove the timeout pattern.

What remains is mostly MCP contract hardening. This PR moved skip decisions from an internal engine behavior into an external caller-visible protocol, so the handler now needs the same level of guardrails and replay fidelity as any other public API: validate that skip tokens are only legal for skip-eligible questions, report the resulting item in the same-turn metadata, and preserve the skip metadata on idempotent/retry paths. Until those are fixed, the MCP path is still too easy to misuse incorrectly.

Follow-up Recommendations

  • Re-run review on the next push if the author addresses the blocking findings.
  • Consider adding targeted regression tests for each changed control path noted above.

Files Reviewed

  • src/ouroboros/bigbang/pm_interview.py
  • src/ouroboros/bigbang/question_classifier.py
  • src/ouroboros/cli/commands/pm.py
  • src/ouroboros/mcp/tools/pm_handler.py
  • tests/unit/bigbang/test_decide_later_items.py
  • tests/unit/bigbang/test_pm_interview.py
  • tests/unit/bigbang/test_question_classifier.py
  • tests/unit/mcp/tools/test_pm_handler.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

When the interview ambiguity score passes the threshold, the MCP now
automatically generates the PM document instead of returning is_complete=true
and waiting for a separate action="generate" call. This eliminates a race
where the LLM response text sounded like completion, causing the skill to
call generate before the engine had actually marked the interview complete.

The action="generate" endpoint is kept as a fallback for retries when
auto-generation fails (meta.generation_failed=true).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Branch: feat/pm-decide-later-user-choice | 9 files, +694/-340 | CI: Ruff Lint fail 12s https://github.com/Q00/ouroboros/actions/runs/23676779678/job/68981225721
Scope: diff-only / branch-wide / architecture-level
HEAD checked: edef812

What Improved

  • The MCP path now exposes classification and skip_eligible metadata and renders explicit skip hints for skip-eligible prompts, which closes the main caller-discoverability gap in the new flow.
  • _handle_answer() now routes [decide_later] / [deferred] through the dedicated engine helpers instead of treating them as normal text responses.
  • The new handler tests cover the sentinel skip paths directly and verify that the skip helpers, not record_response(), are invoked.
  • The completion summary now reports deferred and decide-later counts separately.

Issue #N/A Requirements

Requirement Status
No linked issue found in the PR body MET — reviewed the PR against its stated behavior and current diff
Skip-eligible PM questions should be user-visible and skippable through MCP/CLI METsrc/ouroboros/mcp/tools/pm_handler.py:506, src/ouroboros/mcp/tools/pm_handler.py:879, src/ouroboros/cli/commands/pm.py:271
Skip actions should record the item via the dedicated engine paths METsrc/ouroboros/mcp/tools/pm_handler.py:879, src/ouroboros/mcp/tools/pm_handler.py:890, tests/unit/mcp/tools/test_pm_handler.py:455
Deferred vs decide-later results should be surfaced separately on completion METsrc/ouroboros/mcp/tools/pm_handler.py:966

Prior Findings Status

Prior Finding Status
MCP sentinels were accepted without the caller being told skip was legal WITHDRAWN — start/pending responses now expose classification and skip_eligible, plus skip hint text for eligible questions (src/ouroboros/mcp/tools/pm_handler.py:506, src/ouroboros/mcp/tools/pm_handler.py:811)
Same-turn diff metadata for skip actions was missing WITHDRAWN — the skip mutation now happens before the before/after diff snapshot used for the next response, and the new tests exercise skip recording without regressing the handler flow (src/ouroboros/mcp/tools/pm_handler.py:1011, tests/unit/mcp/tools/test_pm_handler.py:455)
Classifier inline docs still described the old auto-skip behavior MODIFIED — the changed API docs for ClassifierOutputType and ClassificationResult.question_for_pm now match the shipped behavior (src/ouroboros/bigbang/question_classifier.py:49, src/ouroboros/bigbang/question_classifier.py:107)

Blockers

# File:Line Severity Confidence Finding

Follow-ups

# File:Line Priority Confidence Suggestion
1 src/ouroboros/mcp/tools/pm_handler.py:712 Low 72% Consider mirroring classification, skip_eligible, and the skip hint text in the idempotent select_repos replay response as well, so a retry of step 2 preserves the full caller contract for a skip-eligible first question.
2 src/ouroboros/bigbang/question_classifier.py:1 Low 80% There are still a few unchanged top-level/doc-attribute comments describing the old auto-answer/auto-skip model. A small doc-only cleanup PR would keep the source-of-truth comments fully aligned with the new behavior.

Test Coverage

  • I did not execute the test suite in this environment.
  • The added handler tests cover the new sentinel skip entry points directly (tests/unit/mcp/tools/test_pm_handler.py:455, tests/unit/mcp/tools/test_pm_handler.py:531).
  • The remaining useful follow-up would be a replay/idempotency test for a skip-eligible first question on the select_repos retry path.

Merge Recommendation

  • Ready to merge if CI is green.

ouroboros-agent[bot]

…ot, docstrings

- Guard skip sentinels: [decide_later]/[deferred] only honored when
  classification matches; mismatches treated as normal answers
- Move deferred_before/decide_later_before snapshot before skip/record
  so per-round diff captures skip items correctly
- Add classification/skip_eligible to idempotent select_repos replay
- Update classifier docstrings to reflect user-choice behavior
- Add negative-path and diff metadata tests for skip sentinels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cohemm
Copy link
Copy Markdown
Collaborator Author

cohemm commented Mar 28, 2026

리뷰 피드백 반영했습니다.

변경사항:

  • pm_handler.py: skip sentinel guard 추가 — [decide_later]/[deferred]는 해당 classification일 때만 처리, 그 외는 일반 답변으로 기록
  • pm_handler.py: deferred_before/decide_later_before 스냅샷을 skip 전으로 이동 — per-round diff에 skip 항목 정확히 반영
  • pm_handler.py: idempotent select_repos replay에 classification/skip_eligible 메타 추가
  • question_classifier.py: 모든 docstring을 user-choice 동작 기준으로 업데이트
  • test_pm_handler.py: non-skippable 질문에 sentinel 보내는 negative-path 테스트 + skip시 diff metadata 검증 테스트 추가

확인 부탁드립니다!

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 7aea36c | Triggered by: new push by @cohemm

Branch: feat/pm-decide-later-user-choice | 9 files, +848/-353 | CI: Ruff Lint failing; MyPy + tests passing

Issue #N Requirements

Requirement Status
No linked issue found in the PR body MET — reviewed against the PR’s stated behavior and changed paths
Skip-eligible PM questions should remain user-visible and skippable through MCP/CLI MET — current head still exposes and routes skip handling through the new user-choice flow (src/ouroboros/mcp/tools/pm_handler.py, src/ouroboros/cli/commands/pm.py)
Deferred vs decide-later outcomes should be persisted and surfaced separately on completion MET — completion flow and tests still report/store the split outcomes (src/ouroboros/mcp/tools/pm_handler.py, tests/unit/mcp/tools/test_pm_handler.py)
Auto-generation after interview completion should preserve session context and recovery behavior NOT MET — resumed/omitted-cwd flows can write artifacts to the wrong workspace and output failures collapse into a generic tool error instead of the advertised generation recovery contract (src/ouroboros/mcp/tools/pm_handler.py:341, src/ouroboros/mcp/tools/pm_handler.py:979, src/ouroboros/mcp/tools/pm_handler.py:981)

Previous Review Follow-up

Previous Finding Status
MCP skip metadata / skip-hint discoverability gap RESOLVED — current head now returns the caller-visible skip metadata/hints on the normal flow, which was the main concern in the prior review
Same-turn diff metadata for skip actions was missing RESOLVED — current head now records the skip path through the dedicated helpers before response diffing
Classifier docs were out of sync with shipped behavior RESOLVED — the changed classifier docs now match the explicit user-choice flow

Blocking Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/mcp/tools/pm_handler.py:341 Medium 96% handle() now falls back to os.getcwd() whenever cwd is omitted, and later resume paths persist that fallback back into pm_meta. A caller that starts the interview in one repo and resumes without cwd will silently rewrite the session’s saved output directory, so the eventual PM can be generated into the server process directory instead of the project the session started from.
2 src/ouroboros/mcp/tools/pm_handler.py:981 Medium 91% The new auto-generation path writes pm.md to Path(cwd) / ".ouroboros" using the current request’s cwd, not the session’s original saved cwd. On the final answer call, omitting cwd is enough to send the generated artifact to the wrong workspace, which is a behavior regression introduced by moving generation inline.
3 src/ouroboros/mcp/tools/pm_handler.py:979 Medium 87% After the interview is marked complete and saved, save_pm_seed() / save_pm_document() run without a dedicated recovery path. If filesystem output fails, the broad top-level exception handler returns a generic tool error even though the session is already completed, so the caller never gets the advertised generation_failed metadata needed for a clean retry flow.
4 src/ouroboros/mcp/tools/pm_handler.py:770 Low 89% The idempotent select_repos replay still does not return the same answer contract as a normal start response: input_type, response_param, and the skip hint text are missing here even though this PR makes skip eligibility part of the caller-visible contract everywhere else. A step-2 retry on a skip-eligible first question still loses important guidance.
5 src/ouroboros/bigbang/pm_interview.py:621 Low 86% skip_as_decide_later() records a hard-coded placeholder instead of the classifier’s placeholder_response. That drops the model-generated explanation for why the item was deferred, so the transcript and downstream PM extraction lose context that the classifier already computed.

Test Coverage

I could not execute the unit suite in this environment because pytest is not installed.

Useful missing coverage on the changed paths:

  • No CLI tests exercise the new "decide later" / "defer" / "skip" interception paths in src/ouroboros/cli/commands/pm.py:413.
  • No MCP test covers completion when cwd is omitted on resume, which is the path that now drives auto-generation in src/ouroboros/mcp/tools/pm_handler.py:341 and src/ouroboros/mcp/tools/pm_handler.py:981.
  • No MCP test covers output-write failures after completion in src/ouroboros/mcp/tools/pm_handler.py:979, so the new generation_failed recovery contract is not verified for save_pm_seed() / save_pm_document() failures.
  • No replay-parity test asserts that idempotent select_repos returns the same caller contract fields as a fresh start in src/ouroboros/mcp/tools/pm_handler.py:759.

Design Notes

The previously raised issues around skip-sentinel guarding, per-round diff snapshots, and classifier doc alignment are fixed on the current HEAD. The remaining design risk is that the new user-choice flow now depends on session metadata stored outside InterviewState, and auto-generation makes that coupling observable: output location and recovery semantics depend on mutable pm_meta.cwd plus the current request payload instead of a single stable session source of truth.

Follow-up Recommendations

  • Once the session-cwd bug is fixed, add an explicit regression test for resume/final-answer calls where cwd is omitted.
  • Consider carrying the classifier-produced placeholder text through the skip helpers so the PM transcript retains the model’s rationale instead of a generic marker.
  • Consider deduplicating deferred/decide-later entries by semantic item id instead of raw question text if repeat deferrals should remain visible in analytics.

Files Reviewed

  • skills/pm/SKILL.md
  • src/ouroboros/bigbang/pm_interview.py
  • src/ouroboros/bigbang/question_classifier.py
  • src/ouroboros/cli/commands/pm.py
  • src/ouroboros/mcp/tools/pm_handler.py
  • tests/unit/bigbang/test_decide_later_items.py
  • tests/unit/bigbang/test_pm_interview.py
  • tests/unit/bigbang/test_question_classifier.py
  • tests/unit/mcp/tools/test_pm_handler.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

…idity

Remove overly restrictive "Do NOT ask about" constraints and verbose
question examples from _PM_SYSTEM_PROMPT_PREFIX. The strict negative
constraints caused the question generator LLM to exhaust questions
too early while the ambiguity scorer still expected more clarity,
creating an infinite loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Branch: feat/pm-decide-later-user-choice | 9 files, +849/-365 | CI: attention needed
Scope: diff-only
HEAD checked: f7ee814

What Improved

  • The sentinel guard and per-round diff handling from the prior review look fixed on this HEAD.
  • The classifier docstrings and skip-path tests are more aligned with the new explicit user-choice flow.

Issue #— Requirements

Requirement Status
No linked issue found in PR body N/A — reviewed diff-only against the current PR scope

Prior Findings Status

Prior Finding Status
MCP skip metadata / skip-hint discoverability gap WITHDRAWN — current head now returns the caller-visible skip metadata/hints on the normal flow
Same-turn diff metadata for skip actions was missing WITHDRAWN — current head now snapshots diff metadata before the skip helpers run
Classifier docs were out of sync with shipped behavior WITHDRAWN — the updated docstrings now match the explicit user-choice flow
Session cwd can drift across resume/completion calls MAINTAINED — still reproducible on the current HEAD (src/ouroboros/mcp/tools/pm_handler.py:341, src/ouroboros/mcp/tools/pm_handler.py:981)
Post-completion write failures lose the recovery contract MAINTAINED — still reproducible on the current HEAD (src/ouroboros/mcp/tools/pm_handler.py:980)

Blockers

# File:Line Severity Confidence Finding
1 src/ouroboros/mcp/tools/pm_handler.py:341 Medium 97% handle() still falls back to os.getcwd() whenever cwd is omitted, and later resume/completion paths persist that fallback back into pm_meta. A session started in repo A and resumed without cwd will silently rewrite the saved workspace, so the final PM artifact path can drift to the server process directory instead of the project where the interview began.
2 src/ouroboros/mcp/tools/pm_handler.py:981 Medium 95% The inline completion path still writes pm.md to Path(cwd) / ".ouroboros" using the current request payload rather than the session’s saved cwd. On the final-answer call, omitting cwd is enough to generate the PM into the wrong workspace, so the earlier output-location regression is still present.
3 src/ouroboros/mcp/tools/pm_handler.py:980 Medium 93% After the interview is marked complete and persisted, save_pm_seed() and save_pm_document() still run without a recovery wrapper. Any filesystem failure here will bubble to the top-level except and return a generic tool error even though the session is already complete, so callers still do not receive the advertised generation_failed retry contract for post-completion write failures.

Follow-ups

# File:Line Priority Confidence Suggestion
1 src/ouroboros/mcp/tools/pm_handler.py:770 Low 91% Make the idempotent select_repos replay return the same caller contract fields as the fresh start/resume response (input_type, response_param, and skip-hint text) so retries do not lose guidance.
2 src/ouroboros/bigbang/pm_interview.py:612 Low 88% Preserve the classifier-provided placeholder_response in skip_as_decide_later() so the transcript and downstream PM extraction retain the model-generated rationale instead of a generic placeholder.

Test Coverage

  • pytest is not installed in this environment, so I could not execute the changed unit suite.
  • Missing regression coverage remains around resume/final-answer flows where cwd is omitted and the handler must keep using the session’s original workspace (src/ouroboros/mcp/tools/pm_handler.py:341, src/ouroboros/mcp/tools/pm_handler.py:981).
  • There is still no test for save_pm_seed() / save_pm_document() failing after completion and returning the promised recovery contract rather than a generic tool error (src/ouroboros/mcp/tools/pm_handler.py:980).
  • A replay-parity test would help ensure idempotent select_repos responses keep the same caller-facing metadata as the normal first-question response (src/ouroboros/mcp/tools/pm_handler.py:770).

Merge Recommendation

  • merge after blocker(s)

ouroboros-agent[bot]

The LLM extraction prompt already receives raw deferred/decide-later
items as context, so it produces clean summaries. Merging the raw
engine items back caused verbose multi-paragraph question text to
appear as duplicate entries in the Decide Later section of the PM
document.

Now only LLM-extracted items are used in the final seed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit ${HEAD_SHA_SHORT} | Triggered by: new push by @cohemm

Branch: ${BRANCH} | 9 files, +860/-376 | CI: failing (Ruff Lint, Test Python 3.12/3.13/3.14)

What Improved

  • The skip-sentinel guard and the per-round diff snapshot ordering from the previous review are fixed on this HEAD.
  • The classifier/docs now consistently describe the explicit user-choice flow for DECIDE_LATER, and the negative-path tests around non-skippable questions are stronger.

Issue #N Requirements

Requirement Status
Return DECIDE_LATER questions to the user instead of auto-skipping them METask_next_question() now returns the original question for DECIDE_LATER instead of auto-answering/recursing (src/ouroboros/bigbang/pm_interview.py:500-509).
Keep explicit “decide later” choices recorded into the generated PM artifact NOT MET_parse_pm_seed() now drops engine-tracked skipped items unless the extractor re-emits them, so user-chosen decide-later/deferred items can disappear from the generated seed (src/ouroboros/bigbang/pm_interview.py:1184-1189).
Preserve stable workspace/output location across resume/completion calls NOT MET — resume/generate paths still fall back to the current process cwd rather than the session’s saved cwd, so artifacts can be written into the wrong workspace (src/ouroboros/mcp/tools/pm_handler.py:341, src/ouroboros/mcp/tools/pm_handler.py:981-982).

Previous Review Follow-up

Previous Finding Status
MCP skip metadata / skip-hint discoverability gap RESOLVED — this head now returns classification/skip metadata on the replay path as well (src/ouroboros/mcp/tools/pm_handler.py:753-775).
Same-turn diff metadata for skip actions was missing RESOLVED — the new ordering snapshots before skip helpers run, and the added tests cover the diff metadata (tests/unit/mcp/tools/test_pm_handler.py).
Classifier docs were out of sync with shipped behavior RESOLVED — the docstrings now match the explicit user-choice behavior (src/ouroboros/bigbang/question_classifier.py).
Session cwd can drift across resume/completion calls STILL OPEN — still reproducible on this head (src/ouroboros/mcp/tools/pm_handler.py:341, src/ouroboros/mcp/tools/pm_handler.py:981-982).
Post-completion write failures lose the recovery contract STILL OPEN — still reproducible on this head (src/ouroboros/mcp/tools/pm_handler.py:979-982).

Blocking Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/bigbang/pm_interview.py:1184-1189 High 97% _parse_pm_seed() now uses only the extractor’s deferred_items / decide_later_items output and no longer merges the engine-tracked lists. That breaks the new contract: if the extractor omits a user-selected skip item, the generated PM seed silently loses it. The still-present tests in tests/unit/bigbang/test_decide_later_items.py:340-409 explicitly require those engine-tracked items to survive extraction, so this is both a real data-loss regression and a likely source of the current red test job.
2 src/ouroboros/mcp/tools/pm_handler.py:341 Medium 96% handle() still replaces a missing cwd with os.getcwd() at request time. Because later resume/completion calls are allowed to omit cwd, a session started in repo A and resumed from a different server working directory will silently rewrite the session workspace and drift artifact output away from the project where the interview began.
3 src/ouroboros/mcp/tools/pm_handler.py:981-982 Medium 95% The completion path still writes pm.md under Path(cwd) / ".ouroboros" using the current request payload instead of the session’s persisted cwd. On the final-answer or retry call, omitting cwd is enough to generate the PM into the wrong workspace, so the output-location regression from the previous review is still present.
4 src/ouroboros/mcp/tools/pm_handler.py:979-982 Medium 92% After the interview is already complete, save_pm_seed() / save_pm_document() still run outside the existing generation_failed recovery path. Any filesystem failure here bubbles out as a generic tool error even though completion has already been persisted, so callers lose the advertised retryable completion contract instead of getting the documented generation_failed=True response.

Test Coverage

  • I could not run the changed unit suite in this environment because pytest is not installed here, so this is a code-verified review.
  • The changed tests still assert that _parse_pm_seed() preserves engine-tracked decide-later items even when the extractor omits them (tests/unit/bigbang/test_decide_later_items.py:340-409), but the implementation no longer does that.
  • There is still no regression test for resume/final-answer flows that omit cwd yet must keep writing into the session’s original workspace (src/ouroboros/mcp/tools/pm_handler.py:341, src/ouroboros/mcp/tools/pm_handler.py:981-982).
  • There is still no test that simulates save_pm_seed() / save_pm_document() failing after completion and verifies the handler returns the documented retry contract instead of a generic tool error (src/ouroboros/mcp/tools/pm_handler.py:979-982).

Design Notes

The explicit user-choice flow for DECIDE_LATER is a good direction and the recursion removal reduces timeout risk. The remaining problems are state-boundary issues: skipped-item state and workspace state are still split between the engine and the handler, but the completion path is not consistently using the persisted session state as the single source of truth.

Follow-up Recommendations

  • Once the blockers are fixed, add a focused end-to-end test that starts a session with one cwd, resumes without passing cwd, completes the interview, and asserts both pm.md and the seed land in the original workspace.
  • Consider centralizing artifact-path resolution behind persisted session metadata so start/resume/generate/completion all use the same source of truth.

Files Reviewed

  • skills/pm/SKILL.md
  • src/ouroboros/bigbang/pm_interview.py
  • src/ouroboros/bigbang/question_classifier.py
  • src/ouroboros/cli/commands/pm.py
  • src/ouroboros/mcp/tools/pm_handler.py
  • tests/unit/bigbang/test_decide_later_items.py
  • tests/unit/bigbang/test_pm_interview.py
  • tests/unit/bigbang/test_question_classifier.py
  • tests/unit/mcp/tools/test_pm_handler.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

…ve recovery

- Restore engine-tracked deferred/decide_later item merging in _parse_pm_seed()
  with deduplication to prevent data loss when LLM extractor omits items
- Use persisted session cwd for resume/generate actions instead of os.getcwd()
  so artifacts land in the original workspace
- Wrap save_pm_seed/save_pm_document in try/except with generation_failed
  recovery contract for both completion and generate paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cohemm
Copy link
Copy Markdown
Collaborator Author

cohemm commented Mar 28, 2026

리뷰 피드백 반영했습니다.

변경사항:

  • src/ouroboros/bigbang/pm_interview.py: _parse_pm_seed()에서 엔진 추적 deferred/decide_later 항목을 LLM 추출 결과와 병합하도록 복원 (중복 제거 포함)
  • src/ouroboros/mcp/tools/pm_handler.py: resume/generate 액션 시 os.getcwd() 대신 세션에 저장된 cwd를 사용하여 아티팩트가 원래 워크스페이스에 생성되도록 수정
  • src/ouroboros/mcp/tools/pm_handler.py: save_pm_seed()/save_pm_document() 실패 시 generation_failed recovery 경로로 처리되도록 try/except 추가 (completion, generate 양쪽 모두)
  • tests/unit/bigbang/test_pm_interview.py: 병합 동작 복원에 맞게 테스트 assertion 업데이트

확인 부탁드립니다!

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Reviewing commit
t
a
| Triggered by: new push by @cohemm

Branch: | 9 files, +860/-376 | CI: failing (Ruff format), tests pending

Issue Requirements

Requirement Status
Return decide-later/deferred questions to the caller without recursive auto-skips MET — now returns the classified question directly for both and ().
Preserve explicit deferred / decide-later choices in generated PM artifacts MET — again merges engine-tracked deferred/decide-later items with the extractor output and deduplicates them ().
Keep artifact output rooted in the session workspace across resume/generate/completion MET — resume/generate now recover from persisted session metadata when the request omits it, and both completion + generate use that resolved cwd for artifact writes (, , ).
Preserve the documented recovery contract when artifact saves fail after generation/completion MET — both the completion path and explicit generate path now catch save failures and return retry guidance instead of bubbling a generic tool error (, ).

Previous Review Follow-up

Previous Finding Status
stopped carrying engine-tracked deferred / decide-later items forward RESOLVED — merge logic restored with deduplication ().
Session could drift on resume/completion calls RESOLVED — handler now prefers persisted session cwd for resume/generate and uses the resolved cwd for artifact writes (, ).
Post-completion artifact-save failures lost the retry/recovery contract RESOLVED — save failures are now translated into responses in both completion and generate flows (, ).

(No remaining blocking findings on this head.)

Blocking Findings

None.

Test Coverage

  • I could not run the changed unit suite locally because is unavailable in this environment, so this is a code-verified review.
  • The updated implementation now matches the merge expectations in and the prior regression coverage around tracked deferred / decide-later items.
  • CI is still red on a non-blocking formatting issue: needs .

Design Notes

This revision closes the state-boundary regressions from the prior review cleanly: engine-owned skip state is authoritative again, artifact paths are anchored to persisted session metadata, and the recovery contract now covers the save step instead of only seed generation.

Follow-up Recommendations

  • Run so the PR can go green.
  • Consider updating the PR summary/body to reflect the shipped DEFERRED behavior if the intent has shifted from the original description.

Files Reviewed


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@ouroboros-agent
Copy link
Copy Markdown
Contributor

Follow-up note: my previous bot review on this head was posted with mangled Markdown escaping, so here is the concise corrected summary.

  • Prior blockers look resolved on this commit.
  • _parse_pm_seed() once again merges engine-tracked deferred_items / decide_later_items with extractor output and deduplicates them.
  • pm_handler now reuses the persisted session cwd for resume/generate flows, so artifacts stay in the original workspace.
  • Completion/generate now catch save failures and return the documented generation_failed recovery path instead of surfacing a generic tool error.
  • Remaining CI issue is non-blocking: tests/unit/mcp/tools/test_pm_handler.py still needs ruff format.

I’m keeping the approval because the functional regressions from the prior review appear fixed; once the formatting check is cleaned up, this should be ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant