Skip to content

fix(#235): tolerate non-JSON QA verdict responses#236

Open
ouroboros-agent[bot] wants to merge 2 commits intomainfrom
issue-235/ouroborosqa-fails-with-failed-to-parse-q
Open

fix(#235): tolerate non-JSON QA verdict responses#236
ouroboros-agent[bot] wants to merge 2 commits intomainfrom
issue-235/ouroborosqa-fails-with-failed-to-parse-q

Conversation

@ouroboros-agent
Copy link
Copy Markdown
Contributor

Summary

  • make ouroboros_qa fall back to plain-text score/verdict parsing when providers ignore JSON schema output
  • derive the verdict from score when the fallback response omits an explicit verdict
  • add parser regression tests covering non-JSON QA responses

Testing

  • uv run python -m pytest -q /tmp/issue_repo/tests/unit/mcp/tools/test_qa_parser.py /tmp/issue_repo/tests/unit/mcp/tools/test_qa_integration.py -q

Copy link
Copy Markdown
Contributor Author

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 0d8e738 | Triggered by: PR opened

Branch: issue-235/ouroborosqa-fails-with-failed-to-parse-q | 2 files, +84/-7 | CI: Ruff Lint fail 12s https://github.com/Q00/ouroboros/actions/runs/23644472843/job/68873058775
MyPy Type Check pass 46s https://github.com/Q00/ouroboros/actions/runs/23644472843/job/68873058792
Test Python 3.12 pass 1m9s https://github.com/Q00/ouroboros/actions/runs/23644472797/job/68873058755
Test Python 3.13 pass 1m9s https://github.com/Q00/ouroboros/actions/runs/23644472797/job/68873058760
Test Python 3.14 pass 1m15s https://github.com/Q00/ouroboros/actions/runs/23644472797/job/68873058780

Issue Requirements

No linked issue detected in the PR body.

Previous Review Follow-up

First review — no previous findings.

Blocking Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/mcp/tools/qa.py:182 Medium Medium The new non-JSON fallback only matches bare Score: / Verdict: labels at the start of a line, so it still rejects common LLM plain-text variants such as - Score: 0.91, **Score**: 0.91, or Score - 0.91. That means the PR only fixes one narrow response shape, and the QA tool can still fail with “Could not find JSON in QA response” for many realistic non-JSON outputs, which leaves issue #235 only partially addressed.

Test Coverage

tests/unit/mcp/tools/test_qa_parser.py:7 only covers two minimal prose formats. It does not exercise bulleted or Markdown-formatted verdicts, which are the most likely non-JSON variants from chat-style providers and are currently still unhandled by the regex fallback in src/ouroboros/mcp/tools/qa.py:182-195.

Design Notes

The direction is reasonable: keeping _parse_qa_response() tolerant of non-JSON provider output is the right place to harden this tool. The main issue is that the fallback parser is regex-fragile and currently recognizes only a very small subset of “plain text” responses, so the implementation does not yet provide a robust degradation path across providers.

Follow-up Recommendations

  • Re-run review after any substantive push if the PR head changes.
  • Add or update targeted tests for each changed behavior path noted above.

Files Reviewed

  • src/ouroboros/mcp/tools/qa.py
  • tests/unit/mcp/tools/test_qa_parser.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 self-assigned this Mar 27, 2026
@Q00 Q00 marked this pull request as ready for review March 27, 2026 17:19
- Split parsing into 3 layers: strip decorations → extract KV → parse fields
- KV regex uses only unambiguous separators (: and =) to avoid false positives
- Score-specific fallback handles "score is/- X" with word boundary
- Add tests for bulleted, markdown bold, and dash-separated LLM outputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 092de31 | Triggered by: new push

Branch: issue-235/ouroborosqa-fails-with-failed-to-parse-q | 2 files, +149/-7 | CI: Ruff Lint failing; MyPy + tests passing

What improved

  • The previous blocker was partially addressed: the fallback parser now correctly handles bulleted labels, bolded Markdown labels, and ASCII Score - ... score lines in addition to Score: / Score = ....
  • Regression tests were added for the newly supported plain-text shapes.

Issue #235 Requirements

Requirement Status
ouroboros_qa should degrade gracefully when the provider returns plain-text verdicts instead of JSON NOT METVerdict - fail is still ignored because _KV_RE only accepts : or = separators, so _parse_qa_response() silently re-derives the verdict from score and can return the opposite result (src/ouroboros/mcp/tools/qa.py:176, src/ouroboros/mcp/tools/qa.py:232, src/ouroboros/mcp/tools/qa.py:270)
Preserve useful structured QA information from fallback prose rather than failing or discarding it NOT MET — all bullet lines are currently funneled into differences, while suggestions and dimensions stay empty even when the prose includes those sections (src/ouroboros/mcp/tools/qa.py:225-236)

Previous Review Follow-up

Previous Finding Status
Fallback parser did not handle common non-JSON variants such as bullets, bold Markdown labels, or Score - ... MODIFIED — those formats are now accepted, but the updated parser still mishandles dash-separated verdicts and remains lossy for other structured fields (src/ouroboros/mcp/tools/qa.py:174-178, src/ouroboros/mcp/tools/qa.py:208-236)

Blocking Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/mcp/tools/qa.py:176 High High The new Score - 0.88 fallback does not also parse Verdict - fail/revise. _KV_RE only accepts : or =, so fields.get("verdict") stays empty and _parse_qa_response() silently re-derives the verdict from score. That means an explicit textual verdict can now be ignored and inverted, e.g. Score - 0.88 with Verdict - fail returns pass.
2 src/ouroboros/mcp/tools/qa.py:225 Medium High The fallback now treats every bullet item as a difference, regardless of section. Plain-text outputs like Suggestions: followed by bullet points are parsed as differences and suggestions is always returned empty, which materially degrades the structured QA result the tool advertises.
3 src/ouroboros/mcp/tools/qa.py:225 Medium Medium Bulleted dimension lines such as - accuracy: 0.9 / - clarity: 0.8 are also captured as generic differences and never populate dimensions. Since this PR broadens acceptance of non-JSON prose, it drops useful scoring detail from a common fallback shape instead of preserving it.

Test Coverage

  • tests/unit/mcp/tools/test_qa_parser.py:62-73 does not actually verify dash-separated verdict parsing. Verdict - pass is currently ignored, and the test still passes only because the code derives pass from the score.
  • tests/unit/mcp/tools/test_qa_parser.py:6-73 has no coverage for preservation of suggestions or dimensions from non-JSON prose, even though this PR now routes more responses through that fallback path.

Design Notes

The direction is right, and the re-review confirms meaningful progress on the original blocker. The remaining problem is that the fallback is still label-fragile and lossy: it accepts more prose shapes, but one of the newly advertised shapes (Verdict - ...) is only “supported” accidentally via score-based verdict derivation, and richer structured sections from prose are flattened away. For a QA loop tool, that is merge-blocking because it can change loop control and user-visible feedback while appearing to parse successfully.

Follow-up Recommendations

  • Once the blocking correctness issue is fixed, consider extending _strip_line_decorations() to tolerate quoted Markdown output (> Score: ...) and typographic dashes ( / ) as follow-up hardening.
  • The failing Ruff check should be resolved before merge as part of normal CI hygiene.

Files Reviewed

  • src/ouroboros/mcp/tools/qa.py
  • tests/unit/mcp/tools/test_qa_parser.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant