fix(#235): tolerate non-JSON QA verdict responses by ouroboros-agent[bot] · Pull Request #236 · Q00/ouroboros

ouroboros-agent · 2026-03-27T11:39:01Z

Summary

make ouroboros_qa fall back to plain-text score/verdict parsing when providers ignore JSON schema output
derive the verdict from score when the fallback response omits an explicit verdict
add parser regression tests covering non-JSON QA responses

Testing

uv run python -m pytest -q /tmp/issue_repo/tests/unit/mcp/tools/test_qa_parser.py /tmp/issue_repo/tests/unit/mcp/tools/test_qa_integration.py -q

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 0d8e738 | Triggered by: PR opened

Branch: issue-235/ouroborosqa-fails-with-failed-to-parse-q | 2 files, +84/-7 | CI: Ruff Lint fail 12s https://github.com/Q00/ouroboros/actions/runs/23644472843/job/68873058775
MyPy Type Check pass 46s https://github.com/Q00/ouroboros/actions/runs/23644472843/job/68873058792
Test Python 3.12 pass 1m9s https://github.com/Q00/ouroboros/actions/runs/23644472797/job/68873058755
Test Python 3.13 pass 1m9s https://github.com/Q00/ouroboros/actions/runs/23644472797/job/68873058760
Test Python 3.14 pass 1m15s https://github.com/Q00/ouroboros/actions/runs/23644472797/job/68873058780

Issue Requirements

No linked issue detected in the PR body.

Previous Review Follow-up

First review — no previous findings.

Blocking Findings

#	File:Line	Severity	Confidence	Finding
1	src/ouroboros/mcp/tools/qa.py:182	Medium	Medium	The new non-JSON fallback only matches bare `Score:` / `Verdict:` labels at the start of a line, so it still rejects common LLM plain-text variants such as `- Score: 0.91`, `Score: 0.91`, or `Score - 0.91`. That means the PR only fixes one narrow response shape, and the QA tool can still fail with “Could not find JSON in QA response” for many realistic non-JSON outputs, which leaves issue #235 only partially addressed.

Test Coverage

tests/unit/mcp/tools/test_qa_parser.py:7 only covers two minimal prose formats. It does not exercise bulleted or Markdown-formatted verdicts, which are the most likely non-JSON variants from chat-style providers and are currently still unhandled by the regex fallback in src/ouroboros/mcp/tools/qa.py:182-195.

Design Notes

The direction is reasonable: keeping _parse_qa_response() tolerant of non-JSON provider output is the right place to harden this tool. The main issue is that the fallback parser is regex-fragile and currently recognizes only a very small subset of “plain text” responses, so the implementation does not yet provide a robust degradation path across providers.

Follow-up Recommendations

Re-run review after any substantive push if the PR head changes.
Add or update targeted tests for each changed behavior path noted above.

Files Reviewed

src/ouroboros/mcp/tools/qa.py
tests/unit/mcp/tools/test_qa_parser.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

- Split parsing into 3 layers: strip decorations → extract KV → parse fields - KV regex uses only unambiguous separators (: and =) to avoid false positives - Score-specific fallback handles "score is/- X" with word boundary - Add tests for bulleted, markdown bold, and dash-separated LLM outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 092de31 | Triggered by: new push

Branch: issue-235/ouroborosqa-fails-with-failed-to-parse-q | 2 files, +149/-7 | CI: Ruff Lint failing; MyPy + tests passing

What improved

The previous blocker was partially addressed: the fallback parser now correctly handles bulleted labels, bolded Markdown labels, and ASCII Score - ... score lines in addition to Score: / Score = ....
Regression tests were added for the newly supported plain-text shapes.

Issue #235 Requirements

Requirement	Status
`ouroboros_qa` should degrade gracefully when the provider returns plain-text verdicts instead of JSON	NOT MET — `Verdict - fail` is still ignored because `_KV_RE` only accepts `:` or `=` separators, so `_parse_qa_response()` silently re-derives the verdict from score and can return the opposite result (`src/ouroboros/mcp/tools/qa.py:176`, `src/ouroboros/mcp/tools/qa.py:232`, `src/ouroboros/mcp/tools/qa.py:270`)
Preserve useful structured QA information from fallback prose rather than failing or discarding it	NOT MET — all bullet lines are currently funneled into `differences`, while `suggestions` and `dimensions` stay empty even when the prose includes those sections (`src/ouroboros/mcp/tools/qa.py:225-236`)

Previous Review Follow-up

Previous Finding	Status
Fallback parser did not handle common non-JSON variants such as bullets, bold Markdown labels, or `Score - ...`	MODIFIED — those formats are now accepted, but the updated parser still mishandles dash-separated verdicts and remains lossy for other structured fields (`src/ouroboros/mcp/tools/qa.py:174-178`, `src/ouroboros/mcp/tools/qa.py:208-236`)

Blocking Findings

#	File:Line	Severity	Confidence	Finding
1	`src/ouroboros/mcp/tools/qa.py:176`	High	High	The new `Score - 0.88` fallback does not also parse `Verdict - fail/revise`. `_KV_RE` only accepts `:` or `=`, so `fields.get("verdict")` stays empty and `_parse_qa_response()` silently re-derives the verdict from score. That means an explicit textual verdict can now be ignored and inverted, e.g. `Score - 0.88` with `Verdict - fail` returns `pass`.
2	`src/ouroboros/mcp/tools/qa.py:225`	Medium	High	The fallback now treats every bullet item as a `difference`, regardless of section. Plain-text outputs like `Suggestions:` followed by bullet points are parsed as `differences` and `suggestions` is always returned empty, which materially degrades the structured QA result the tool advertises.
3	`src/ouroboros/mcp/tools/qa.py:225`	Medium	Medium	Bulleted dimension lines such as `- accuracy: 0.9` / `- clarity: 0.8` are also captured as generic `differences` and never populate `dimensions`. Since this PR broadens acceptance of non-JSON prose, it drops useful scoring detail from a common fallback shape instead of preserving it.

Test Coverage

tests/unit/mcp/tools/test_qa_parser.py:62-73 does not actually verify dash-separated verdict parsing. Verdict - pass is currently ignored, and the test still passes only because the code derives pass from the score.
tests/unit/mcp/tools/test_qa_parser.py:6-73 has no coverage for preservation of suggestions or dimensions from non-JSON prose, even though this PR now routes more responses through that fallback path.

Design Notes

The direction is right, and the re-review confirms meaningful progress on the original blocker. The remaining problem is that the fallback is still label-fragile and lossy: it accepts more prose shapes, but one of the newly advertised shapes (Verdict - ...) is only “supported” accidentally via score-based verdict derivation, and richer structured sections from prose are flattened away. For a QA loop tool, that is merge-blocking because it can change loop control and user-visible feedback while appearing to parse successfully.

Follow-up Recommendations

Once the blocking correctness issue is fixed, consider extending _strip_line_decorations() to tolerate quoted Markdown output (> Score: ...) and typographic dashes (– / —) as follow-up hardening.
The failing Ruff check should be resolved before merge as part of normal CI hygiene.

Files Reviewed

src/ouroboros/mcp/tools/qa.py
tests/unit/mcp/tools/test_qa_parser.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

fix(#235): tolerate non-JSON QA verdict responses

0d8e738

ouroboros-agent bot mentioned this pull request Mar 27, 2026

ouroboros_qa fails with "Failed to parse QA verdict: Could not find JSON in QA response" #235

Closed

ouroboros-agent bot commented Mar 27, 2026

View reviewed changes

Q00 self-assigned this Mar 27, 2026

Q00 marked this pull request as ready for review March 27, 2026 17:19

ouroboros-agent bot commented Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#235): tolerate non-JSON QA verdict responses#236

fix(#235): tolerate non-JSON QA verdict responses#236
ouroboros-agent[bot] wants to merge 2 commits intomainfrom
issue-235/ouroborosqa-fails-with-failed-to-parse-q

ouroboros-agent bot commented Mar 27, 2026

Uh oh!

ouroboros-agent bot left a comment

Uh oh!

ouroboros-agent bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ouroboros-agent bot commented Mar 27, 2026

Summary

Testing

Uh oh!

ouroboros-agent bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Issue Requirements

Previous Review Follow-up

Blocking Findings

Test Coverage

Design Notes

Follow-up Recommendations

Files Reviewed

Uh oh!

ouroboros-agent bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

What improved

Issue #235 Requirements

Previous Review Follow-up

Blocking Findings

Test Coverage

Design Notes

Follow-up Recommendations

Files Reviewed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant