Skip to content

ouroboros_evaluate: trigger_consensus ignored when Stage 2 < 0.80 + ArtifactCollector fails without pyproject.toml #230

@jcfernandez-890825

Description

@jcfernandez-890825

Summary

Two bugs in ouroboros_evaluate that prevent APPROVED verdicts for non-standard projects (e.g., Odoo CLI scripts that don't use pyproject.toml).

Bug 1: trigger_consensus silently ignored when Stage 2 < 0.80

Expected: When trigger_consensus=true is passed, Stage 3 (advocate + contrarian + judge) runs regardless of Stage 2 score.

Actual: Stage 3 is gated behind Stage 2's 0.80 threshold. If Stage 2 scores below 0.80, trigger_consensus=true has no effect — Stage 3 never runs.

Impact: CLI scripts (non-module code) consistently score 0.72 in Stage 2 semantic evaluation. The trigger_consensus parameter cannot override this, making 3-model consensus unreachable for these artifact types.

Suggested fix: When trigger_consensus=true, bypass the Stage 2 threshold and proceed directly to Stage 3 consensus. The whole point of forcing consensus is to get a second opinion when Stage 2 results are disputed.

Bug 2: ArtifactCollector fails silently without pyproject.toml/setup.py/package.json

Location: adapter.py:99-110 (_looks_like_project_root), adapter.py:145-160 (_project_dir_from_artifact), artifact_collector.py:44-45

Expected: When artifact contains file paths (e.g., File: /path/to/code.py), the collector reads those files from disk and bundles them for the semantic evaluator.

Actual: The collector requires a project root directory to enforce path boundary checks. _extract_project_dir() walks up from file paths looking for pyproject.toml, setup.py, or package.json. If none exist (common in Odoo, Django, and many enterprise projects), project_dir = None, and the collector returns an empty bundle. The semantic evaluator then only sees the artifact text, not the actual source files.

Additional issues:

  • _project_dir_from_artifact() (line 150) only matches Write: and Edit: prefixes — NOT File: as expected
  • Adding pyproject.toml to fix file collection triggers auto-detection of wrong language presets (languages.py:68-71: Python preset runs ruff check . and python -m compileall -q src/ which fail for Odoo projects)
  • config.yaml project_dir setting requires MCP server restart to take effect (no hot reload)

Suggested fix:

  1. Accept working_dir from the evaluate tool call as an explicit project_dir override (highest priority in _extract_project_dir)
  2. Match File: prefix in addition to Write: and Edit: in _project_dir_from_artifact
  3. Allow disabling language presets when pyproject.toml exists (e.g., [tool.ouroboros] language = "none")

Environment

  • Ouroboros version: 0.25.1 (installed via uv)
  • Python: 3.14
  • Project type: Odoo 17 EE (uses odoo.cfg, no pyproject.toml)
  • Artifact: 818-line standalone CLI script with 83 passing unit tests

Reproduction

# This scores 0.72 consistently — trigger_consensus has no effect
ouroboros_evaluate(
    session_id="any",
    artifact="<full 818-line Python source code>",
    artifact_type="code",
    trigger_consensus=True,  # silently ignored
    acceptance_criterion="Create a 4-stage CLI pipeline...",
    seed_content="goal: ...\nacceptance_criteria: ...",
)
# Result: Stage 2 score=0.72, REJECTED, Stage 3 never runs

# This should read files from disk but returns empty bundle
ouroboros_evaluate(
    session_id="any",
    artifact="Write: /path/to/project/script.py",  # file exists on disk
    working_dir="/path/to/project",  # no pyproject.toml here
)
# Result: ArtifactCollector returns empty bundle, score=0.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugReproducible defect or broken behavior

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions