feat(runtime): add Gemini CLI as third execution runtime#167
feat(runtime): add Gemini CLI as third execution runtime#167tgmerritt wants to merge 64 commits intoQ00:release/0.26.0-betafrom
Conversation
Introduce AgentRuntime Protocol and RuntimeHandle for backend-neutral runtime management. Add Codex CLI runtime implementation with session tracking, MCP tool definitions, parallel AC execution with retry/resume, and comprehensive test coverage (2800+ tests passing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix QA structured output schema for Codex/OpenAI compatibility by adding `additionalProperties: false` and all fields to `required` - Add seed_path support to StartExecuteSeedHandler (previously only ExecuteSeedHandler resolved seed_path to seed_content) - Include Runtime/LLM Backend info in start_execute_seed response - Add terminal status parametrized tests for session_status handler - Clean up OpenCode runtime stubs with explicit NotImplementedError - Add error handling for ValueError/NotImplementedError in CLI run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI renders help output with ANSI escape codes that split `--runtime` into separate escape sequences, causing exact string match to fail. Use case-insensitive keyword matching instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sertions Rich inserts ANSI escape sequences at hyphen boundaries in CLI help output (e.g. --llm-backend), causing plain-text assertions to fail. Setting NO_COLOR=1 in the root conftest.py disables color output for all tests, fixing the 4 failing CI checks and preventing future breakage for any hyphenated option names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In CI, GITHUB_ACTIONS env var causes Typer to set force_terminal=True on Rich Console, emitting ANSI escape codes into CliRunner's string buffer. This breaks plain-text assertions for hyphenated options like --llm-backend. Use Typer's built-in _TYPER_FORCE_DISABLE_TERMINAL escape hatch instead of NO_COLOR (which only disables colors but leaves bold/dim style codes intact). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…setup - Move claude-agent-sdk, anthropic, litellm from core deps to optional extras ([claude], [litellm], [all]) so Codex-only users can install ouroboros-ai without unnecessary SDK dependencies - Convert eager imports to lazy: LiteLLMAdapter in providers/__init__.py and factory.py, litellm in core/context.py (with len//4 fallback) - Add `ouroboros setup` CLI command with auto-detection of available runtimes (claude, codex) and interactive/non-interactive modes - Add scripts/install.sh one-liner installer with runtime auto-detection - Update README Quick Start to show 3 parallel install paths: Claude Code Plugin / Standalone pip / One-liner - Update SKILL.md with standalone setup reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The dev group previously included ouroboros-ai[all] which pulled in the dashboard extra (streamlit → watchdog). watchdog is untyped, and mypy cannot resolve watchdog.observers.Observer as a valid type on Linux. Use ouroboros-ai[claude,litellm] instead — dev needs runtime deps for testing but not dashboard visualization deps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security: - Add InputValidator.validate_llm_response() to CodexCliLLMAdapter (parity with other adapters) - Pass prompt via stdin instead of CLI argument to avoid ARG_MAX limits - Add await stdin.drain() before close to ensure flush on large prompts - Remove _extract_text() recursive fallback to prevent data leakage via error messages - Add asyncio.timeout to legacy process.communicate() fallback path - Validate resume_session_id with regex pattern to prevent CLI argument injection Reliability: - Guard _cancellation_registry with asyncio.Lock for concurrent access safety - Add terminal state check before mark_cancelled to prevent race condition - Add _max_resume_retries=3 depth limit to prevent infinite execute_task recursion - Add 50MB buffer limit to _iter_stream_lines() with incremental byte tracking - Fix EventStore connection leak in ExecuteSeedHandler background task - Guard None sentinels in parallel_executor level_results Quality: - Change interview permission mode from acceptEdits to default for codex/opencode - Remove 28 lines of unreachable dead code in _build_runtime_handle - Add warning log for silently discarded non-string session_id - Cache derive_runtime_signal results to reduce redundant calls (3x → 2x) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restructure all documentation, README, and branding from Claude Code-centric plugin to runtime-agnostic workflow engine supporting both Claude Code and Codex CLI as equal first-class runtime backends. Key changes: - README restructured as conversion page (problem-solver positioning) - Quick Start with runtime tabs: Claude Code | Codex CLI | Standalone - New runtime guides: docs/runtime-guides/claude-code.md, codex.md - Runtime capability matrix comparing backends side-by-side - Architecture docs updated with runtime abstraction layer - CLI reference updated for setup, --runtime, --non-interactive - Platform support matrix (Windows experimental/WSL recommended) - SECURITY.md with standard vulnerability reporting policy - Python version corrected to >=3.12 everywhere (was 3.14+) - All "Claude Code plugin" references replaced with agnostic language - Legacy docs/running-with-claude-code.md preserved as redirect stub - Codex ooo skill support documented (rules + skills install) - Config value corrected: runtime_backend: claude (not claude-code) - Stale "Claude Agent SDK" references updated in guides - Install commands match pyproject.toml exactly - Demo image placeholders added for interview/seed/evaluation - Sub-tagline: "Specification-first workflow engine for AI coding agents" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- S1: path containment for absolute/relative paths in security.py - Q1a: complete handler re-export in mcp/tools/__init__.py - A1: remove getattr fallback from definitions.py - Handler split: definitions.py → per-domain handler modules - Fix non-deterministic updated_at timestamp flake in parallel executor test - Add runtime_backend/working_directory/permission_mode to all test stubs - Deep-clone consistency for RuntimeHandle metadata - Ruff format cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both _setup_claude and _setup_codex now write timeout: 600 and OUROBOROS_AGENT_RUNTIME / OUROBOROS_LLM_BACKEND env vars into mcp.json. Existing entries are backfilled on re-run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every mcp.json example/template now includes timeout: 600 and OUROBOROS_AGENT_RUNTIME env var. Fixes docs/cli-reference.md, docs/guides/cli-usage.md, docs/guides/common-workflows.md, and .claude-plugin/.mcp.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OUROBOROS_AGENT_RUNTIME in mcp.json env would override config.yaml (env > config priority), making runtime changes via config.yaml silently ineffective. Runtime selection belongs in config.yaml only, which setup already writes correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mcp.json handles MCP server registration (timeout only). Runtime backend is configured in ~/.ouroboros/config.yaml, with optional OUROBOROS_AGENT_RUNTIME env var override for power users. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
extract_json_payload() now tries each { position via brace-counting
and validates with json.loads, instead of only trying the first {.
This fixes 75% QA verdict parse failures caused by Anthropic's
prefill workaround producing prose with stray braces before JSON.
Also adds llms-full.txt with deep model-facing reference content
and bolsters the Secondary Loop section with TODO registry and
batch scheduler details.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate getting-started.md as the single onboarding SSOT, remove duplicated guides (cli-usage, common-workflows, language-support, quick-start), delete stale API design docs and ontological-framework directory, and trim verbose sections across architecture, cli-reference, and runtime guides. README retains philosophy sections, TUI section moved to docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add self-answering interview mode, improve codex CLI adapter error handling, update provider factory runtime detection, and expand MCP authoring handler coverage with corresponding tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update .gitignore, .mcp.json timeout config, expand CONTRIBUTING.md with dev workflow details, refresh skill definitions for interview/ evolve/setup, and sync socratic-interviewer agent spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Provide llms.txt as a concise index and llms-full.txt as a detailed reference, following the Context7 convention so AI coding agents can ingest project context efficiently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…inel When ouroboros spawns a runtime (Codex/Claude/OpenCode), the child process may read its own MCP config and spawn another ouroboros server, causing exponential process tree growth (34+ processes observed). The sentinel env var is set on first serve() entry and inherited by all child processes, causing nested instances to exit(0) immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… docs - Replace "Fallback" with "Alternative" for non-Claude runtime paths - Change "CLI fallback" to "CLI equivalent" throughout getting-started - Trim verbose metadata blocks from cli-reference, codex, config-reference - Create docs/guides/evolution-loop.md: Ralph, Wonder/Reflect, convergence - Commit pending docs: config-reference, evaluation-pipeline, findings-registry - Update docs/README.md index: add evolution guide, remove broken links Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix ruff format in 17 files (session.py, mcp.py, test files, scripts) - Remove unused imports (pytest in test_json_utils) - Fix StrEnum inheritance (examples/task_manager) - Fix unused vars and args in scripts (doc_volatility, migrate_authority, semantic_link_rot_check) - Delete leftover playground/src/ files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three related bugs caused the AC execution tree to show only the root "Seed" node with no children: 1. _notify_ac_tree_updated() only updated the active screen — when events arrived while session selector was active, the dashboard tree was never refreshed. Now uses get_screen() to reach the installed dashboard regardless of which screen is active. 2. DashboardScreenV3 lacked an on_show() hook — when switching back to the dashboard, the tree was stuck with its initial empty state. Now refreshes from _state.ac_tree on every show. 3. parallel_executor used 0-based AC index while WorkflowStateTracker uses 1-based (as documented in AcceptanceCriterion.index). Fixed to i+1 for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A. Fix subtask events using 0-based ac_index that couldn't match 1-based tree node keys — subtasks now attach to parent nodes. B. Replace all self.screen forwarding in app.py with _forward_to_dashboard() helper that reaches the installed dashboard via get_screen(), preventing message drops when session selector or other screens are active. D. Wrap _execute_parallel() call in try/except to persist session.failed events on unhandled exceptions, preventing 0-event ghost sessions. E. Expand on_show() to refresh phase_bar and activity_bar in addition to AC tree when dashboard becomes active. F. Remove dead DashboardScreen import and dashboard_v2 references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hook Allows runtime subclasses to control how prompts are delivered: - _build_command() now accepts an optional prompt kwarg (ignored by Codex CLI which uses stdin) - _feeds_prompt_via_stdin() returns True by default; subclasses can override to False to skip stdin prompt delivery - _execute_task_impl() passes composed_prompt to _build_command() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add heartbeat-based alive check to orphan detection so sessions with active runtime processes are not cancelled on MCP restart - Enable SQLite WAL mode and busy_timeout=30s for concurrent access - Add retry logic (3 attempts) to event_store.append() for transient "database is locked" errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix event_store.py: move logger after imports, prefix unused arg with underscore, remove unused last_err variable, fix import order - Fix heartbeat.py: ProcessNotFoundError → ProcessLookupError (correct Python built-in exception name) - Apply ruff format to both files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The heartbeat integration in find_orphaned_sessions() checks real lock files, causing test pollution. Add autouse fixture to mock get_alive_sessions() with an empty set in both TestFindOrphanedSessions and TestCancelOrphanedSessions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _setup_claude() now persists runtime_backend, llm.backend, and claude_path to config.yaml (matching _setup_codex() behavior) - start_execute_seed_handler() now accepts and propagates runtime_backend/llm_backend to the inner ExecuteSeedHandler - Add tests for setup config persistence and backend propagation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tate guard - _build_tool_arguments() now preserves original mcp_args (initial_context, cwd, etc.) and overlays session_id/answer, instead of rebuilding from scratch - StartExecuteSeedHandler now checks terminal session status (completed, cancelled, failed) before enqueueing, matching ExecuteSeedHandler behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply the same fix from command_dispatcher.py to codex_cli_runtime.py's _build_tool_arguments() — preserve original mcp_args and overlay session_id/answer instead of rebuilding from scratch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- setup.py: write `cli_path` instead of `claude_path` so the config loader actually picks up the detected Claude binary path. - execution_handlers.py: when seed_path does not exist on disk, fall back to treating the value as inline YAML instead of returning an error, matching the documented tool contract for both ouroboros_execute_seed and ouroboros_start_execute_seed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- getting-started.md: remove nonexistent `ouroboros interview` command, clarify that interview is available via `ooo` or MCP tools only, add required seed_file arg to `ouroboros run` examples - architecture.md: fix interview entrypoint references - README.md: remove Codex from `ooo` usage note (not yet supported) - cli-reference.md: replace opencode manual config suggestion with "not yet implemented" warning - config-reference.md: add "not yet implemented" caveat to opencode settings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ementedError Address ouroboros-agent review findings: - Remove OPENCODE enum values from CLI parsers (init, mcp, run) - Reject opencode at resolve_*_backend() with early ValueError - Replace opencode normalization tests with boundary rejection tests - Ensure legacy subprocess fallback restores schema transforms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Convert opencode server creation test to assert ValueError rejection - Convert opencode execution handler test to assert MCPToolError on reject - Switch resume test from opencode to codex (tests resume path, not backend) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README: fix `ouroboros run workflow` → `ouroboros run seed.yaml` - getting-started: add required seed path to --resume examples - CONTRIBUTING: replace dead `docs/guides/cli-usage.md` refs with `docs/getting-started.md` - codex/ouroboros.md: align setup/update descriptions with actual CLI behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oints - README: rewrite commands table to show skill vs CLI equivalents - seed-authoring: replace all `ouroboros interview *` with `ouroboros init start *` - Clarify that some skills (evaluate, evolve, etc.) are MCP/skill-only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These commands are Claude Code skills only, not standalone CLI commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: fix 16 audit findings from agent team review Critical fixes: - getting-started.md: Correct interview command info (ouroboros init start exists) - README.md: Fix ouroboros status requires subcommand, add cancel to table - getting-started.md: Remove overstated Claude/Codex parity claim High fixes: - README.md: Add install.sh one-liner to Standalone quick-start - cli-reference.md: Fix TUI backend option (python, not textual) - CONTRIBUTING.md: Remove broken docs/api/parallel-execution.md reference - findings-registry.md: Mark entity-registry migration as planned-not-created - codex.md: Clarify status command syntax Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix remaining entity-registry broken references in findings-registry Clean up frontmatter description, schema changelog, backward-compat rule, and record_type field description that still referenced non-existent entity-registry.yaml and migration guide files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: mark FIND-044 resolved, fix open findings count Update findings-registry to reflect codex.md status command fix: - FIND-044 status: open → resolved (both YAML and summary table) - Remove FIND-044 from open findings list - Replace detail section with resolution note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: resolve FIND-045 and FIND-050, update registry - FIND-045: Add credentials.yaml cross-links to claude-code.md and codex.md - FIND-050: Already fixed in codex.md:104 (parenthetical note); mark resolved - Update open findings list: only FIND-018, FIND-019 remain (structural) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: sync registry stats and fix README claude-code link wording - Update YAML stats: open 5→2, resolved 45→48 - Update summary table: medium open 3→0, total open 5→2 - README: change "full details" to "backend configuration and CLI options" to accurately describe what claude-code.md covers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Google Gemini CLI as a third AgentRuntime alongside Claude Code and Codex CLI. Change the default execution runtime to Codex and hardcode Claude as the interview backend regardless of configured runtime. New modules: - gemini_permissions.py: permission mode -> CLI flag mapping - providers/gemini_cli_adapter.py: LLM adapter (subclasses CodexCliLLMAdapter) - orchestrator/gemini_cli_runtime.py: agent runtime (subclasses CodexCliRuntime) Config changes: - OrchestratorConfig.runtime_backend default: "claude" -> "codex" - New fields: gemini_cli_path, gemini_permission_mode - New Literal value: "gemini" in runtime_backend and llm.backend - New env vars: OUROBOROS_GEMINI_CLI_PATH, OUROBOROS_GEMINI_PERMISSION_MODE Factory changes: - create_llm_adapter(use_case="interview") always returns ClaudeCodeAdapter - Both factories resolve "gemini"/"gemini_cli" aliases Includes 35 new unit tests and updated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
431e72b| Triggered by: PR opened
Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: unknown
Issue #N/A Requirements
| Requirement | Status |
|---|---|
| No linked issue detected in PR body | N/A — No issue requirements to map. |
Previous Review Follow-up
| Previous Finding | Status |
|---|---|
| First review — no previous findings. | N/A — First bot review on this PR. |
Code Findings
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | src/ouroboros/providers/gemini_cli_adapter.py:41 | High | High | GeminiCliLLMAdapter._build_command() ignores output_schema_path, so every response_format={"type":"json_schema",...} request is silently downgraded to an unconstrained text completion. That breaks existing QA / semantic / consensus callers which parse the reply as JSON and currently depend on schema-constrained output. |
| 2 | src/ouroboros/orchestrator/gemini_cli_runtime.py:103 | High | Medium | The Gemini runtime only overrides session-id extraction, but inherits Codex-only event normalization. The parent runtime converts thread.started, item.completed, and turn.failed; Gemini -o stream-json events are unlikely to match that shape, so assistant/tool streaming and explicit failure events will be dropped instead of surfaced to the orchestrator. |
| 3 | src/ouroboros/config/loader.py:564 | Medium | High | _default_model_for_backend() treats Gemini as a “Codex-like” backend and rewrites default model names to "default". For consensus this collapses the default 3-model roster into ("default","default","default"), which defeats min_models/diversity_required semantics and turns stage-3 consensus into repeated votes from the same local default model. |
| 4 | src/ouroboros/providers/gemini_cli_adapter.py:68 | Medium | Medium | The LLM adapter only recognizes session_id, while the runtime already had to support both session_id and sessionId. If Gemini emits camelCase here as well, one-shot completions will lose session tracking in raw_response, making resume/debug metadata inconsistent across the two Gemini code paths. |
| 5 | docs/runtime-guides/gemini.md:73 | Medium | High | The Gemini guide documents gemini_permission_mode: sandbox / auto_edit and matching env/CLI examples, but the actual accepted values are default, acceptEdits, and bypassPermissions (OrchestratorConfig and resolve_gemini_permission_mode). Following the guide produces validation/runtime errors on first use. |
| 6 | docs/config-reference.md:79 | Low | High | The config reference says runtime_backend: opencode raises NotImplementedError, but the factory currently rejects it with ValueError during backend resolution. The docs are advertising a behavior and support level that the implementation does not provide. |
Test Coverage
Missing tests for env/config precedence and backend-specific branches in src/ouroboros/config/loader.py, especially Gemini-specific permission selection and the "default" model normalization path.
Missing Gemini structured-output tests in src/ouroboros/providers/gemini_cli_adapter.py that exercise CompletionConfig.response_format the same way src/ouroboros/evaluation/consensus.py and src/ouroboros/mcp/tools/qa.py use it.
Missing event-shape normalization tests in src/ouroboros/orchestrator/gemini_cli_runtime.py using representative Gemini stream-json payloads; current tests only validate command construction and session-id extraction.
Missing regression coverage for consensus model defaults under llm.backend=gemini across src/ouroboros/config/loader.py and src/ouroboros/evaluation/consensus.py.
I could not run the Python test suite here because pytest is not installed in this environment.
Design
The general direction is sound: adding Gemini through the existing runtime/provider factories keeps the public surface relatively clean. The problem is that the implementation reuses Codex base classes almost verbatim while only adapting flags and path lookup. That is not enough when the transport protocol differs. The current design assumes Gemini emits Codex-compatible runtime and completion events, plus equivalent schema/output features, but the code does not prove or enforce that.
The config/docs layer is also ahead of the implementation. New options are exposed broadly, including unsupported opencode values and Gemini permission examples that do not match the actual validators. No previous review findings were supplied, so there was nothing to verify as fixed or still open from an earlier round.
Files Reviewed
docs/config-reference.mddocs/runtime-capability-matrix.mddocs/runtime-guides/gemini.mdsrc/ouroboros/config/__init__.pysrc/ouroboros/config/loader.pysrc/ouroboros/config/models.pysrc/ouroboros/gemini_permissions.pysrc/ouroboros/orchestrator/__init__.pysrc/ouroboros/orchestrator/gemini_cli_runtime.pysrc/ouroboros/orchestrator/runtime_factory.pysrc/ouroboros/providers/__init__.pysrc/ouroboros/providers/factory.pysrc/ouroboros/providers/gemini_cli_adapter.pytests/unit/config/test_models.pytests/unit/orchestrator/test_gemini_cli_runtime.py
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
431e72b| Triggered by: PR opened
Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: no checks reported on the 'feat/gemini-runtime' branch
Issue #N/A Requirements
| Requirement | Status |
|---|---|
| No linked issue detected in PR body | N/A — No issue requirements to map. |
Previous Review Follow-up
| Previous Finding | Status |
|---|---|
| First review — no previous findings. | N/A — First bot review on this PR. |
Code Findings
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | src/ouroboros/providers/gemini_cli_adapter.py:41 |
high | high | GeminiCliLLMAdapter._build_command() accepts output_schema_path but never uses it. The parent completion flow still builds a schema file and expects the backend command to enforce it, so any response_format={"type": "json_schema", ...} request sent through the Gemini adapter silently degrades to unconstrained free-form output. That is a behavioral regression relative to the existing Codex adapter contract. |
| 2 | docs/runtime-guides/gemini.md:73 |
medium | high | The new guide tells users to set gemini_permission_mode: sandbox / auto_edit, but the implementation only accepts default, acceptEdits, and bypassPermissions (src/ouroboros/gemini_permissions.py:14). Copy-pasting the documented config will fail validation before the runtime even starts. |
| 3 | docs/runtime-guides/gemini.md:220 |
medium | high | The CLI examples repeat the same invalid permission values (--permission-mode sandbox and --permission-mode auto_edit). The runtime validators and config model only recognize default, acceptEdits, and bypassPermissions, so the documented commands are not runnable as written. |
| 4 | src/ouroboros/providers/gemini_cli_adapter.py:78 |
medium | medium | The provider-side session-id extraction only recognizes session_id, while the runtime-side Gemini parser was already extended to accept both session_id and sessionId (src/ouroboros/orchestrator/gemini_cli_runtime.py:103). If the Gemini CLI emits camelCase on the one-shot path too, completion metadata/session tracking will be lost there even though the runtime path handles it. |
| 5 | src/ouroboros/config/loader.py:531 |
medium | medium | create_llm_adapter(backend="gemini") resolves its permission mode through get_llm_permission_mode() (src/ouroboros/providers/factory.py:98), but that loader only has backend-specific branches for OpenCode. The new Gemini backend therefore has no Gemini-specific permission override path on the LLM-only side, which makes the runtime and provider surfaces inconsistent and prevents a dedicated Gemini override from ever being honored there. |
Test Coverage
The current test suite covers constructor wiring and flag mapping, but it does not exercise the two places most likely to break in real use:
- No provider test requests a structured response format through
GeminiCliLLMAdapter, so the missingoutput_schema_pathhandling is not caught (tests/unit/providers/test_gemini_cli_adapter.py). - No test asserts that the documentation examples use validator-accepted permission strings, so the
sandbox/auto_editdrift shipped unchecked (tests/unit/test_gemini_permissions.py, docs only). - No provider test covers
sessionId(camelCase) on the one-shot Gemini path, even though the runtime path explicitly anticipates that variant (tests/unit/providers/test_gemini_cli_adapter.py,tests/unit/orchestrator/test_gemini_cli_runtime.py).
Design
The overall shape is sensible: keep Gemini as a thin Codex-derived runtime/provider pair, centralize permission translation in gemini_permissions.py, and wire backend selection through the existing factories. That preserves the handler/runtime layering rather than forking new orchestration logic.
The problems are mostly contract mismatches at the edges:
- the provider adapter does not yet preserve the parent structured-output contract,
- the docs/examples are out of sync with the actual validator surface,
- and the runtime/provider config surfaces are not aligned on Gemini-specific permission handling.
I would fix those boundary mismatches before merging; otherwise the first user-facing Gemini paths are likely to fail in ways that look like product instability rather than normal beta roughness.
Files Reviewed
docs/config-reference.mddocs/runtime-capability-matrix.mddocs/runtime-guides/gemini.mdsrc/ouroboros/config/__init__.pysrc/ouroboros/config/loader.pysrc/ouroboros/config/models.pysrc/ouroboros/gemini_permissions.pysrc/ouroboros/orchestrator/__init__.pysrc/ouroboros/orchestrator/gemini_cli_runtime.pysrc/ouroboros/orchestrator/runtime_factory.pysrc/ouroboros/providers/__init__.pysrc/ouroboros/providers/factory.pysrc/ouroboros/providers/gemini_cli_adapter.pytests/unit/config/test_models.pytests/unit/orchestrator/test_gemini_cli_runtime.py
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Thanks for the contribution, @tgmerritt! The overall shape is solid — thin Codex-derived subclasses with centralized permission translation is exactly the right pattern. A few things to address before merge, and some that can wait.
Must fix before merge
1. output_schema_path silently dropped (High)
GeminiCliLLMAdapter._build_command() accepts output_schema_path but never passes it to the CLI. The parent _complete_once() builds a schema temp file and expects the backend to enforce it, so any response_format={"type": "json_schema", ...} request (used by QA, consensus, semantic callers) silently degrades to unconstrained text output. This will cause JSON parse failures downstream.
Either wire --output-schema through (if Gemini CLI supports it) or explicitly raise/warn when structured output is requested but unsupported.
2. Docs use invalid permission values (Medium-High)
docs/runtime-guides/gemini.md documents sandbox and auto_edit as config values (lines 73, 220+), but the actual validator (resolve_gemini_permission_mode) only accepts default, acceptEdits, bypassPermissions. Users following the guide will hit a validation error on first use. The permission table and CLI examples both need updating.
Can defer to next iteration
3. Event normalization assumes Codex event shape — The Gemini runtime inherits Codex event parsing (thread.started, item.completed, turn.failed). Gemini's stream-json output is unlikely to match. This is fine for beta since the actual Gemini CLI event format needs empirical validation first. Suggest opening a follow-up issue to test with real Gemini output and adapt the normalizer.
4. sessionId camelCase inconsistency — Runtime handles both session_id and sessionId, but the provider-side adapter only checks session_id. Low priority until we confirm what Gemini actually emits.
5. Consensus model defaults — Gemini is grouped with Codex in _CODEX_LLM_BACKENDS, so _default_model_for_backend() rewrites consensus models to ("default", "default", "default"), defeating diversity. Only matters when OpenRouter isn't configured. Low priority for beta.
Needs discussion
6. Breaking change: default runtime claude → codex
OrchestratorConfig.runtime_backend default changed from "claude" to "codex". This silently changes behavior for every user without an explicit config. This should either:
- Be reverted and handled in a separate PR with proper changelog/migration note, or
- At minimum be called out as a breaking change in the PR description and release notes
7. Interview hardcoded to Claude
create_llm_adapter(use_case="interview") now always returns ClaudeCodeAdapter. Reasonable for quality, but Gemini-only users (no Anthropic API key / no Max Plan) won't be able to run interviews at all. Worth documenting this requirement explicitly.
Bot review assessment
The ouroboros-agent bot flagged largely the same issues. Findings #1 and #2 are real blockers. The rest are valid observations but appropriate to defer for a beta release. The bot was thorough — not unfair, just not triaging by release phase.
Nice work on the 35 tests and thorough docs. Looking forward to the next iteration! 🚀
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
431e72b| Triggered by: backlog review sweep
Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: No checks reported
Issue #N/A Requirements
| Requirement | Status |
|---|---|
| No linked issue detected in PR body. | See code findings below for file:line evidence. |
Previous Review Follow-up
| Previous Finding | Status |
|---|---|
| Prior bot review exists | Follow-up checked in current analysis. |
Code Findings
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | src/ouroboros/providers/gemini_cli_adapter.py:41 | High | High | GeminiCliLLMAdapter._build_command() ignores output_schema_path, so every response_format={"type":"json_schema",...} request is silently downgraded to an unconstrained text completion. That breaks existing QA / semantic / consensus callers which parse the reply as JSON and currently depend on schema-constrained output. |
| 2 | src/ouroboros/orchestrator/gemini_cli_runtime.py:103 | High | Medium | The Gemini runtime only overrides session-id extraction, but inherits Codex-only event normalization. The parent runtime converts thread.started, item.completed, and turn.failed; Gemini -o stream-json events are unlikely to match that shape, so assistant/tool streaming and explicit failure events will be dropped instead of surfaced to the orchestrator. |
| 3 | src/ouroboros/config/loader.py:564 | Medium | High | _default_model_for_backend() treats Gemini as a “Codex-like” backend and rewrites default model names to "default". For consensus this collapses the default 3-model roster into ("default","default","default"), which defeats min_models/diversity_required semantics and turns stage-3 consensus into repeated votes from the same local default model. |
| 4 | src/ouroboros/providers/gemini_cli_adapter.py:68 | Medium | Medium | The LLM adapter only recognizes session_id, while the runtime already had to support both session_id and sessionId. If Gemini emits camelCase here as well, one-shot completions will lose session tracking in raw_response, making resume/debug metadata inconsistent across the two Gemini code paths. |
| 5 | docs/runtime-guides/gemini.md:73 | Medium | High | The Gemini guide documents gemini_permission_mode: sandbox / auto_edit and matching env/CLI examples, but the actual accepted values are default, acceptEdits, and bypassPermissions (OrchestratorConfig and resolve_gemini_permission_mode). Following the guide produces validation/runtime errors on first use. |
| 6 | docs/config-reference.md:79 | Low | High | The config reference says runtime_backend: opencode raises NotImplementedError, but the factory currently rejects it with ValueError during backend resolution. The docs are advertising a behavior and support level that the implementation does not provide. |
Test Coverage
Missing tests for env/config precedence and backend-specific branches in src/ouroboros/config/loader.py, especially Gemini-specific permission selection and the "default" model normalization path.
Missing Gemini structured-output tests in src/ouroboros/providers/gemini_cli_adapter.py that exercise CompletionConfig.response_format the same way src/ouroboros/evaluation/consensus.py and src/ouroboros/mcp/tools/qa.py use it.
Missing event-shape normalization tests in src/ouroboros/orchestrator/gemini_cli_runtime.py using representative Gemini stream-json payloads; current tests only validate command construction and session-id extraction.
Missing regression coverage for consensus model defaults under llm.backend=gemini across src/ouroboros/config/loader.py and src/ouroboros/evaluation/consensus.py.
I could not run the Python test suite here because pytest is not installed in this environment.
Design
The general direction is sound: adding Gemini through the existing runtime/provider factories keeps the public surface relatively clean. The problem is that the implementation reuses Codex base classes almost verbatim while only adapting flags and path lookup. That is not enough when the transport protocol differs. The current design assumes Gemini emits Codex-compatible runtime and completion events, plus equivalent schema/output features, but the code does not prove or enforce that.
The config/docs layer is also ahead of the implementation. New options are exposed broadly, including unsupported opencode values and Gemini permission examples that do not match the actual validators. No previous review findings were supplied, so there was nothing to verify as fixed or still open from an earlier round.
Files Reviewed
docs/config-reference.mddocs/runtime-capability-matrix.mddocs/runtime-guides/gemini.mdsrc/ouroboros/config/__init__.pysrc/ouroboros/config/loader.pysrc/ouroboros/config/models.pysrc/ouroboros/gemini_permissions.pysrc/ouroboros/orchestrator/__init__.pysrc/ouroboros/orchestrator/gemini_cli_runtime.pysrc/ouroboros/orchestrator/runtime_factory.pysrc/ouroboros/providers/__init__.pysrc/ouroboros/providers/factory.pysrc/ouroboros/providers/gemini_cli_adapter.pytests/unit/config/test_models.pytests/unit/orchestrator/test_gemini_cli_runtime.py
Reviewed by ouroboros-agent[bot] via Codex deep analysis
a4cefc6 to
73b6b27
Compare
|
Your branch will need a rebase: git fetch origin
git rebase origin/release/0.26.0-betaKey structural changes to be aware of:
Let me know if you need help with the rebase. |
7ca9a80 to
eb9fc80
Compare
Summary
AgentRuntimealongside Claude Code and Codex CLIclaudetocodexcreate_llm_adapter(use_case="interview")always returnsClaudeCodeAdapter)config.yaml, CLI flag (--runtime gemini), and env var (OUROBOROS_AGENT_RUNTIME=gemini)New modules
gemini_permissions.py— permission mode to CLI flag mapping (--sandbox,--approval-mode,--yolo)providers/gemini_cli_adapter.py— LLM adapter (thin subclass ofCodexCliLLMAdapter)orchestrator/gemini_cli_runtime.py— agent runtime (thin subclass ofCodexCliRuntime)Config changes
OrchestratorConfig.runtime_backenddefault:"claude"→"codex"gemini_cli_path,gemini_permission_modeOUROBOROS_GEMINI_CLI_PATH,OUROBOROS_GEMINI_PERMISSION_MODEFactory changes
create_llm_adapter(use_case="interview")always returnsClaudeCodeAdapter"gemini"/"gemini_cli"aliasesDocumentation
docs/runtime-guides/gemini.mdTest plan
ruff checkandruff formatpass on all changed filescreate_llm_adapter(backend="gemini", use_case="interview")returnsClaudeCodeAdapterOrchestratorConfig().runtime_backend == "codex"resolve_agent_runtime_backend("gemini_cli") == "gemini"🤖 Generated with Claude Code