feat(runtime): add Gemini CLI as third execution runtime by tgmerritt · Pull Request #167 · Q00/ouroboros

tgmerritt · 2026-03-20T22:56:17Z

Summary

Add Google Gemini CLI as a third AgentRuntime alongside Claude Code and Codex CLI
Change the default execution runtime from claude to codex
Hardcode Claude as the interview backend regardless of configured runtime (create_llm_adapter(use_case="interview") always returns ClaudeCodeAdapter)
Support user selection via config.yaml, CLI flag (--runtime gemini), and env var (OUROBOROS_AGENT_RUNTIME=gemini)

New modules

gemini_permissions.py — permission mode to CLI flag mapping (--sandbox, --approval-mode, --yolo)
providers/gemini_cli_adapter.py — LLM adapter (thin subclass of CodexCliLLMAdapter)
orchestrator/gemini_cli_runtime.py — agent runtime (thin subclass of CodexCliRuntime)

Config changes

OrchestratorConfig.runtime_backend default: "claude" → "codex"
New fields: gemini_cli_path, gemini_permission_mode
New env vars: OUROBOROS_GEMINI_CLI_PATH, OUROBOROS_GEMINI_PERMISSION_MODE

Factory changes

create_llm_adapter(use_case="interview") always returns ClaudeCodeAdapter
Both runtime and provider factories resolve "gemini" / "gemini_cli" aliases

Documentation

New runtime guide: docs/runtime-guides/gemini.md
Updated capability matrix with Gemini column
Updated config reference with new keys and env vars

Test plan

35 new unit tests for Gemini permissions, adapter, runtime, and factory wiring
Updated existing factory and config model tests for new defaults
192 feature-related tests pass (0 failures)
ruff check and ruff format pass on all changed files
Interview hardcoding verified: create_llm_adapter(backend="gemini", use_case="interview") returns ClaudeCodeAdapter
Default change verified: OrchestratorConfig().runtime_backend == "codex"
Factory resolution verified: resolve_agent_runtime_backend("gemini_cli") == "gemini"

🤖 Generated with Claude Code

Introduce AgentRuntime Protocol and RuntimeHandle for backend-neutral runtime management. Add Codex CLI runtime implementation with session tracking, MCP tool definitions, parallel AC execution with retry/resume, and comprehensive test coverage (2800+ tests passing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix QA structured output schema for Codex/OpenAI compatibility by adding `additionalProperties: false` and all fields to `required` - Add seed_path support to StartExecuteSeedHandler (previously only ExecuteSeedHandler resolved seed_path to seed_content) - Include Runtime/LLM Backend info in start_execute_seed response - Add terminal status parametrized tests for session_status handler - Clean up OpenCode runtime stubs with explicit NotImplementedError - Add error handling for ValueError/NotImplementedError in CLI run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CI renders help output with ANSI escape codes that split `--runtime` into separate escape sequences, causing exact string match to fail. Use case-insensitive keyword matching instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…sertions Rich inserts ANSI escape sequences at hyphen boundaries in CLI help output (e.g. --llm-backend), causing plain-text assertions to fail. Setting NO_COLOR=1 in the root conftest.py disables color output for all tests, fixing the 4 failing CI checks and preventing future breakage for any hyphenated option names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

In CI, GITHUB_ACTIONS env var causes Typer to set force_terminal=True on Rich Console, emitting ANSI escape codes into CliRunner's string buffer. This breaks plain-text assertions for hyphenated options like --llm-backend. Use Typer's built-in _TYPER_FORCE_DISABLE_TERMINAL escape hatch instead of NO_COLOR (which only disables colors but leaves bold/dim style codes intact). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…setup - Move claude-agent-sdk, anthropic, litellm from core deps to optional extras ([claude], [litellm], [all]) so Codex-only users can install ouroboros-ai without unnecessary SDK dependencies - Convert eager imports to lazy: LiteLLMAdapter in providers/__init__.py and factory.py, litellm in core/context.py (with len//4 fallback) - Add `ouroboros setup` CLI command with auto-detection of available runtimes (claude, codex) and interactive/non-interactive modes - Add scripts/install.sh one-liner installer with runtime auto-detection - Update README Quick Start to show 3 parallel install paths: Claude Code Plugin / Standalone pip / One-liner - Update SKILL.md with standalone setup reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The dev group previously included ouroboros-ai[all] which pulled in the dashboard extra (streamlit → watchdog). watchdog is untyped, and mypy cannot resolve watchdog.observers.Observer as a valid type on Linux. Use ouroboros-ai[claude,litellm] instead — dev needs runtime deps for testing but not dashboard visualization deps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Security: - Add InputValidator.validate_llm_response() to CodexCliLLMAdapter (parity with other adapters) - Pass prompt via stdin instead of CLI argument to avoid ARG_MAX limits - Add await stdin.drain() before close to ensure flush on large prompts - Remove _extract_text() recursive fallback to prevent data leakage via error messages - Add asyncio.timeout to legacy process.communicate() fallback path - Validate resume_session_id with regex pattern to prevent CLI argument injection Reliability: - Guard _cancellation_registry with asyncio.Lock for concurrent access safety - Add terminal state check before mark_cancelled to prevent race condition - Add _max_resume_retries=3 depth limit to prevent infinite execute_task recursion - Add 50MB buffer limit to _iter_stream_lines() with incremental byte tracking - Fix EventStore connection leak in ExecuteSeedHandler background task - Guard None sentinels in parallel_executor level_results Quality: - Change interview permission mode from acceptEdits to default for codex/opencode - Remove 28 lines of unreachable dead code in _build_runtime_handle - Add warning log for silently discarded non-string session_id - Cache derive_runtime_signal results to reduce redundant calls (3x → 2x) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restructure all documentation, README, and branding from Claude Code-centric plugin to runtime-agnostic workflow engine supporting both Claude Code and Codex CLI as equal first-class runtime backends. Key changes: - README restructured as conversion page (problem-solver positioning) - Quick Start with runtime tabs: Claude Code | Codex CLI | Standalone - New runtime guides: docs/runtime-guides/claude-code.md, codex.md - Runtime capability matrix comparing backends side-by-side - Architecture docs updated with runtime abstraction layer - CLI reference updated for setup, --runtime, --non-interactive - Platform support matrix (Windows experimental/WSL recommended) - SECURITY.md with standard vulnerability reporting policy - Python version corrected to >=3.12 everywhere (was 3.14+) - All "Claude Code plugin" references replaced with agnostic language - Legacy docs/running-with-claude-code.md preserved as redirect stub - Codex ooo skill support documented (rules + skills install) - Config value corrected: runtime_backend: claude (not claude-code) - Stale "Claude Agent SDK" references updated in guides - Install commands match pyproject.toml exactly - Demo image placeholders added for interview/seed/evaluation - Sub-tagline: "Specification-first workflow engine for AI coding agents" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- S1: path containment for absolute/relative paths in security.py - Q1a: complete handler re-export in mcp/tools/__init__.py - A1: remove getattr fallback from definitions.py - Handler split: definitions.py → per-domain handler modules - Fix non-deterministic updated_at timestamp flake in parallel executor test - Add runtime_backend/working_directory/permission_mode to all test stubs - Deep-clone consistency for RuntimeHandle metadata - Ruff format cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Both _setup_claude and _setup_codex now write timeout: 600 and OUROBOROS_AGENT_RUNTIME / OUROBOROS_LLM_BACKEND env vars into mcp.json. Existing entries are backfilled on re-run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Every mcp.json example/template now includes timeout: 600 and OUROBOROS_AGENT_RUNTIME env var. Fixes docs/cli-reference.md, docs/guides/cli-usage.md, docs/guides/common-workflows.md, and .claude-plugin/.mcp.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

OUROBOROS_AGENT_RUNTIME in mcp.json env would override config.yaml (env > config priority), making runtime changes via config.yaml silently ineffective. Runtime selection belongs in config.yaml only, which setup already writes correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mcp.json handles MCP server registration (timeout only). Runtime backend is configured in ~/.ouroboros/config.yaml, with optional OUROBOROS_AGENT_RUNTIME env var override for power users. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

extract_json_payload() now tries each { position via brace-counting and validates with json.loads, instead of only trying the first {. This fixes 75% QA verdict parse failures caused by Anthropic's prefill workaround producing prose with stray braces before JSON. Also adds llms-full.txt with deep model-facing reference content and bolsters the Secondary Loop section with TODO registry and batch scheduler details. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Consolidate getting-started.md as the single onboarding SSOT, remove duplicated guides (cli-usage, common-workflows, language-support, quick-start), delete stale API design docs and ontological-framework directory, and trim verbose sections across architecture, cli-reference, and runtime guides. README retains philosophy sections, TUI section moved to docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add self-answering interview mode, improve codex CLI adapter error handling, update provider factory runtime detection, and expand MCP authoring handler coverage with corresponding tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update .gitignore, .mcp.json timeout config, expand CONTRIBUTING.md with dev workflow details, refresh skill definitions for interview/ evolve/setup, and sync socratic-interviewer agent spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide llms.txt as a concise index and llms-full.txt as a detailed reference, following the Context7 convention so AI coding agents can ingest project context efficiently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…inel When ouroboros spawns a runtime (Codex/Claude/OpenCode), the child process may read its own MCP config and spawn another ouroboros server, causing exponential process tree growth (34+ processes observed). The sentinel env var is set on first serve() entry and inherited by all child processes, causing nested instances to exit(0) immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… docs - Replace "Fallback" with "Alternative" for non-Claude runtime paths - Change "CLI fallback" to "CLI equivalent" throughout getting-started - Trim verbose metadata blocks from cli-reference, codex, config-reference - Create docs/guides/evolution-loop.md: Ralph, Wonder/Reflect, convergence - Commit pending docs: config-reference, evaluation-pipeline, findings-registry - Update docs/README.md index: add evolution guide, remove broken links Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix ruff format in 17 files (session.py, mcp.py, test files, scripts) - Remove unused imports (pytest in test_json_utils) - Fix StrEnum inheritance (examples/task_manager) - Fix unused vars and args in scripts (doc_volatility, migrate_authority, semantic_link_rot_check) - Delete leftover playground/src/ files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three related bugs caused the AC execution tree to show only the root "Seed" node with no children: 1. _notify_ac_tree_updated() only updated the active screen — when events arrived while session selector was active, the dashboard tree was never refreshed. Now uses get_screen() to reach the installed dashboard regardless of which screen is active. 2. DashboardScreenV3 lacked an on_show() hook — when switching back to the dashboard, the tree was stuck with its initial empty state. Now refreshes from _state.ac_tree on every show. 3. parallel_executor used 0-based AC index while WorkflowStateTracker uses 1-based (as documented in AcceptanceCriterion.index). Fixed to i+1 for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

A. Fix subtask events using 0-based ac_index that couldn't match 1-based tree node keys — subtasks now attach to parent nodes. B. Replace all self.screen forwarding in app.py with _forward_to_dashboard() helper that reaches the installed dashboard via get_screen(), preventing message drops when session selector or other screens are active. D. Wrap _execute_parallel() call in try/except to persist session.failed events on unhandled exceptions, preventing 0-event ghost sessions. E. Expand on_show() to refresh phase_bar and activity_bar in addition to AC tree when dashboard becomes active. F. Remove dead DashboardScreen import and dashboard_v2 references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…hook Allows runtime subclasses to control how prompts are delivered: - _build_command() now accepts an optional prompt kwarg (ignored by Codex CLI which uses stdin) - _feeds_prompt_via_stdin() returns True by default; subclasses can override to False to skip stdin prompt delivery - _execute_task_impl() passes composed_prompt to _build_command() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add heartbeat-based alive check to orphan detection so sessions with active runtime processes are not cancelled on MCP restart - Enable SQLite WAL mode and busy_timeout=30s for concurrent access - Add retry logic (3 attempts) to event_store.append() for transient "database is locked" errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix event_store.py: move logger after imports, prefix unused arg with underscore, remove unused last_err variable, fix import order - Fix heartbeat.py: ProcessNotFoundError → ProcessLookupError (correct Python built-in exception name) - Apply ruff format to both files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The heartbeat integration in find_orphaned_sessions() checks real lock files, causing test pollution. Add autouse fixture to mock get_alive_sessions() with an empty set in both TestFindOrphanedSessions and TestCancelOrphanedSessions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- _setup_claude() now persists runtime_backend, llm.backend, and claude_path to config.yaml (matching _setup_codex() behavior) - start_execute_seed_handler() now accepts and propagates runtime_backend/llm_backend to the inner ExecuteSeedHandler - Add tests for setup config persistence and backend propagation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tate guard - _build_tool_arguments() now preserves original mcp_args (initial_context, cwd, etc.) and overlays session_id/answer, instead of rebuilding from scratch - StartExecuteSeedHandler now checks terminal session status (completed, cancelled, failed) before enqueueing, matching ExecuteSeedHandler behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply the same fix from command_dispatcher.py to codex_cli_runtime.py's _build_tool_arguments() — preserve original mcp_args and overlay session_id/answer instead of rebuilding from scratch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- setup.py: write `cli_path` instead of `claude_path` so the config loader actually picks up the detected Claude binary path. - execution_handlers.py: when seed_path does not exist on disk, fall back to treating the value as inline YAML instead of returning an error, matching the documented tool contract for both ouroboros_execute_seed and ouroboros_start_execute_seed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- getting-started.md: remove nonexistent `ouroboros interview` command, clarify that interview is available via `ooo` or MCP tools only, add required seed_file arg to `ouroboros run` examples - architecture.md: fix interview entrypoint references - README.md: remove Codex from `ooo` usage note (not yet supported) - cli-reference.md: replace opencode manual config suggestion with "not yet implemented" warning - config-reference.md: add "not yet implemented" caveat to opencode settings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ementedError Address ouroboros-agent review findings: - Remove OPENCODE enum values from CLI parsers (init, mcp, run) - Reject opencode at resolve_*_backend() with early ValueError - Replace opencode normalization tests with boundary rejection tests - Ensure legacy subprocess fallback restores schema transforms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Convert opencode server creation test to assert ValueError rejection - Convert opencode execution handler test to assert MCPToolError on reject - Switch resume test from opencode to codex (tests resume path, not backend) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- README: fix `ouroboros run workflow` → `ouroboros run seed.yaml` - getting-started: add required seed path to --resume examples - CONTRIBUTING: replace dead `docs/guides/cli-usage.md` refs with `docs/getting-started.md` - codex/ouroboros.md: align setup/update descriptions with actual CLI behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…oints - README: rewrite commands table to show skill vs CLI equivalents - seed-authoring: replace all `ouroboros interview *` with `ouroboros init start *` - Clarify that some skills (evaluate, evolve, etc.) are MCP/skill-only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These commands are Claude Code skills only, not standalone CLI commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix 16 audit findings from agent team review Critical fixes: - getting-started.md: Correct interview command info (ouroboros init start exists) - README.md: Fix ouroboros status requires subcommand, add cancel to table - getting-started.md: Remove overstated Claude/Codex parity claim High fixes: - README.md: Add install.sh one-liner to Standalone quick-start - cli-reference.md: Fix TUI backend option (python, not textual) - CONTRIBUTING.md: Remove broken docs/api/parallel-execution.md reference - findings-registry.md: Mark entity-registry migration as planned-not-created - codex.md: Clarify status command syntax Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix remaining entity-registry broken references in findings-registry Clean up frontmatter description, schema changelog, backward-compat rule, and record_type field description that still referenced non-existent entity-registry.yaml and migration guide files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: mark FIND-044 resolved, fix open findings count Update findings-registry to reflect codex.md status command fix: - FIND-044 status: open → resolved (both YAML and summary table) - Remove FIND-044 from open findings list - Replace detail section with resolution note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: resolve FIND-045 and FIND-050, update registry - FIND-045: Add credentials.yaml cross-links to claude-code.md and codex.md - FIND-050: Already fixed in codex.md:104 (parenthetical note); mark resolved - Update open findings list: only FIND-018, FIND-019 remain (structural) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: sync registry stats and fix README claude-code link wording - Update YAML stats: open 5→2, resolved 45→48 - Update summary table: medium open 3→0, total open 5→2 - README: change "full details" to "backend configuration and CLI options" to accurately describe what claude-code.md covers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add Google Gemini CLI as a third AgentRuntime alongside Claude Code and Codex CLI. Change the default execution runtime to Codex and hardcode Claude as the interview backend regardless of configured runtime. New modules: - gemini_permissions.py: permission mode -> CLI flag mapping - providers/gemini_cli_adapter.py: LLM adapter (subclasses CodexCliLLMAdapter) - orchestrator/gemini_cli_runtime.py: agent runtime (subclasses CodexCliRuntime) Config changes: - OrchestratorConfig.runtime_backend default: "claude" -> "codex" - New fields: gemini_cli_path, gemini_permission_mode - New Literal value: "gemini" in runtime_backend and llm.backend - New env vars: OUROBOROS_GEMINI_CLI_PATH, OUROBOROS_GEMINI_PERMISSION_MODE Factory changes: - create_llm_adapter(use_case="interview") always returns ClaudeCodeAdapter - Both factories resolve "gemini"/"gemini_cli" aliases Includes 35 new unit tests and updated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 431e72b | Triggered by: PR opened

Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: unknown

Issue #N/A Requirements

Requirement	Status
No linked issue detected in PR body	N/A — No issue requirements to map.

Previous Review Follow-up

Previous Finding	Status
First review — no previous findings.	N/A — First bot review on this PR.

Code Findings

#	File:Line	Severity	Confidence	Finding
1	src/ouroboros/providers/gemini_cli_adapter.py:41	High	High	`GeminiCliLLMAdapter._build_command()` ignores `output_schema_path`, so every `response_format={"type":"json_schema",...}` request is silently downgraded to an unconstrained text completion. That breaks existing QA / semantic / consensus callers which parse the reply as JSON and currently depend on schema-constrained output.
2	src/ouroboros/orchestrator/gemini_cli_runtime.py:103	High	Medium	The Gemini runtime only overrides session-id extraction, but inherits Codex-only event normalization. The parent runtime converts `thread.started`, `item.completed`, and `turn.failed`; Gemini `-o stream-json` events are unlikely to match that shape, so assistant/tool streaming and explicit failure events will be dropped instead of surfaced to the orchestrator.
3	src/ouroboros/config/loader.py:564	Medium	High	`_default_model_for_backend()` treats Gemini as a “Codex-like” backend and rewrites default model names to `"default"`. For consensus this collapses the default 3-model roster into `("default","default","default")`, which defeats `min_models`/`diversity_required` semantics and turns stage-3 consensus into repeated votes from the same local default model.
4	src/ouroboros/providers/gemini_cli_adapter.py:68	Medium	Medium	The LLM adapter only recognizes `session_id`, while the runtime already had to support both `session_id` and `sessionId`. If Gemini emits camelCase here as well, one-shot completions will lose session tracking in `raw_response`, making resume/debug metadata inconsistent across the two Gemini code paths.
5	docs/runtime-guides/gemini.md:73	Medium	High	The Gemini guide documents `gemini_permission_mode: sandbox` / `auto_edit` and matching env/CLI examples, but the actual accepted values are `default`, `acceptEdits`, and `bypassPermissions` (`OrchestratorConfig` and `resolve_gemini_permission_mode`). Following the guide produces validation/runtime errors on first use.
6	docs/config-reference.md:79	Low	High	The config reference says `runtime_backend: opencode` raises `NotImplementedError`, but the factory currently rejects it with `ValueError` during backend resolution. The docs are advertising a behavior and support level that the implementation does not provide.

Test Coverage

Missing tests for env/config precedence and backend-specific branches in src/ouroboros/config/loader.py, especially Gemini-specific permission selection and the "default" model normalization path.
Missing Gemini structured-output tests in src/ouroboros/providers/gemini_cli_adapter.py that exercise CompletionConfig.response_format the same way src/ouroboros/evaluation/consensus.py and src/ouroboros/mcp/tools/qa.py use it.
Missing event-shape normalization tests in src/ouroboros/orchestrator/gemini_cli_runtime.py using representative Gemini stream-json payloads; current tests only validate command construction and session-id extraction.
Missing regression coverage for consensus model defaults under llm.backend=gemini across src/ouroboros/config/loader.py and src/ouroboros/evaluation/consensus.py.
I could not run the Python test suite here because pytest is not installed in this environment.

Design

The general direction is sound: adding Gemini through the existing runtime/provider factories keeps the public surface relatively clean. The problem is that the implementation reuses Codex base classes almost verbatim while only adapting flags and path lookup. That is not enough when the transport protocol differs. The current design assumes Gemini emits Codex-compatible runtime and completion events, plus equivalent schema/output features, but the code does not prove or enforce that.

The config/docs layer is also ahead of the implementation. New options are exposed broadly, including unsupported opencode values and Gemini permission examples that do not match the actual validators. No previous review findings were supplied, so there was nothing to verify as fixed or still open from an earlier round.

Files Reviewed

docs/config-reference.md
docs/runtime-capability-matrix.md
docs/runtime-guides/gemini.md
src/ouroboros/config/__init__.py
src/ouroboros/config/loader.py
src/ouroboros/config/models.py
src/ouroboros/gemini_permissions.py
src/ouroboros/orchestrator/__init__.py
src/ouroboros/orchestrator/gemini_cli_runtime.py
src/ouroboros/orchestrator/runtime_factory.py
src/ouroboros/providers/__init__.py
src/ouroboros/providers/factory.py
src/ouroboros/providers/gemini_cli_adapter.py
tests/unit/config/test_models.py
tests/unit/orchestrator/test_gemini_cli_runtime.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 431e72b | Triggered by: PR opened

Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: no checks reported on the 'feat/gemini-runtime' branch

Issue #N/A Requirements

Requirement	Status
No linked issue detected in PR body	N/A — No issue requirements to map.

Previous Review Follow-up

Previous Finding	Status
First review — no previous findings.	N/A — First bot review on this PR.

Code Findings

#	File:Line	Severity	Confidence	Finding
1	`src/ouroboros/providers/gemini_cli_adapter.py:41`	high	high	`GeminiCliLLMAdapter._build_command()` accepts `output_schema_path` but never uses it. The parent completion flow still builds a schema file and expects the backend command to enforce it, so any `response_format={"type": "json_schema", ...}` request sent through the Gemini adapter silently degrades to unconstrained free-form output. That is a behavioral regression relative to the existing Codex adapter contract.
2	`docs/runtime-guides/gemini.md:73`	medium	high	The new guide tells users to set `gemini_permission_mode: sandbox` / `auto_edit`, but the implementation only accepts `default`, `acceptEdits`, and `bypassPermissions` (`src/ouroboros/gemini_permissions.py:14`). Copy-pasting the documented config will fail validation before the runtime even starts.
3	`docs/runtime-guides/gemini.md:220`	medium	high	The CLI examples repeat the same invalid permission values (`--permission-mode sandbox` and `--permission-mode auto_edit`). The runtime validators and config model only recognize `default`, `acceptEdits`, and `bypassPermissions`, so the documented commands are not runnable as written.
4	`src/ouroboros/providers/gemini_cli_adapter.py:78`	medium	medium	The provider-side session-id extraction only recognizes `session_id`, while the runtime-side Gemini parser was already extended to accept both `session_id` and `sessionId` (`src/ouroboros/orchestrator/gemini_cli_runtime.py:103`). If the Gemini CLI emits camelCase on the one-shot path too, completion metadata/session tracking will be lost there even though the runtime path handles it.
5	`src/ouroboros/config/loader.py:531`	medium	medium	`create_llm_adapter(backend="gemini")` resolves its permission mode through `get_llm_permission_mode()` (`src/ouroboros/providers/factory.py:98`), but that loader only has backend-specific branches for OpenCode. The new Gemini backend therefore has no Gemini-specific permission override path on the LLM-only side, which makes the runtime and provider surfaces inconsistent and prevents a dedicated Gemini override from ever being honored there.

Test Coverage

The current test suite covers constructor wiring and flag mapping, but it does not exercise the two places most likely to break in real use:

No provider test requests a structured response format through GeminiCliLLMAdapter, so the missing output_schema_path handling is not caught (tests/unit/providers/test_gemini_cli_adapter.py).
No test asserts that the documentation examples use validator-accepted permission strings, so the sandbox / auto_edit drift shipped unchecked (tests/unit/test_gemini_permissions.py, docs only).
No provider test covers sessionId (camelCase) on the one-shot Gemini path, even though the runtime path explicitly anticipates that variant (tests/unit/providers/test_gemini_cli_adapter.py, tests/unit/orchestrator/test_gemini_cli_runtime.py).

Design

The overall shape is sensible: keep Gemini as a thin Codex-derived runtime/provider pair, centralize permission translation in gemini_permissions.py, and wire backend selection through the existing factories. That preserves the handler/runtime layering rather than forking new orchestration logic.

The problems are mostly contract mismatches at the edges:

the provider adapter does not yet preserve the parent structured-output contract,
the docs/examples are out of sync with the actual validator surface,
and the runtime/provider config surfaces are not aligned on Gemini-specific permission handling.

I would fix those boundary mismatches before merging; otherwise the first user-facing Gemini paths are likely to fail in ways that look like product instability rather than normal beta roughness.

Files Reviewed

docs/config-reference.md
docs/runtime-capability-matrix.md
docs/runtime-guides/gemini.md
src/ouroboros/config/__init__.py
src/ouroboros/config/loader.py
src/ouroboros/config/models.py
src/ouroboros/gemini_permissions.py
src/ouroboros/orchestrator/__init__.py
src/ouroboros/orchestrator/gemini_cli_runtime.py
src/ouroboros/orchestrator/runtime_factory.py
src/ouroboros/providers/__init__.py
src/ouroboros/providers/factory.py
src/ouroboros/providers/gemini_cli_adapter.py
tests/unit/config/test_models.py
tests/unit/orchestrator/test_gemini_cli_runtime.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00

Thanks for the contribution, @tgmerritt! The overall shape is solid — thin Codex-derived subclasses with centralized permission translation is exactly the right pattern. A few things to address before merge, and some that can wait.

Must fix before merge

1. output_schema_path silently dropped (High)
GeminiCliLLMAdapter._build_command() accepts output_schema_path but never passes it to the CLI. The parent _complete_once() builds a schema temp file and expects the backend to enforce it, so any response_format={"type": "json_schema", ...} request (used by QA, consensus, semantic callers) silently degrades to unconstrained text output. This will cause JSON parse failures downstream.

Either wire --output-schema through (if Gemini CLI supports it) or explicitly raise/warn when structured output is requested but unsupported.

2. Docs use invalid permission values (Medium-High)
docs/runtime-guides/gemini.md documents sandbox and auto_edit as config values (lines 73, 220+), but the actual validator (resolve_gemini_permission_mode) only accepts default, acceptEdits, bypassPermissions. Users following the guide will hit a validation error on first use. The permission table and CLI examples both need updating.

Can defer to next iteration

3. Event normalization assumes Codex event shape — The Gemini runtime inherits Codex event parsing (thread.started, item.completed, turn.failed). Gemini's stream-json output is unlikely to match. This is fine for beta since the actual Gemini CLI event format needs empirical validation first. Suggest opening a follow-up issue to test with real Gemini output and adapt the normalizer.

4. sessionId camelCase inconsistency — Runtime handles both session_id and sessionId, but the provider-side adapter only checks session_id. Low priority until we confirm what Gemini actually emits.

5. Consensus model defaults — Gemini is grouped with Codex in _CODEX_LLM_BACKENDS, so _default_model_for_backend() rewrites consensus models to ("default", "default", "default"), defeating diversity. Only matters when OpenRouter isn't configured. Low priority for beta.

Needs discussion

6. Breaking change: default runtime claude → codex
OrchestratorConfig.runtime_backend default changed from "claude" to "codex". This silently changes behavior for every user without an explicit config. This should either:

Be reverted and handled in a separate PR with proper changelog/migration note, or
At minimum be called out as a breaking change in the PR description and release notes

7. Interview hardcoded to Claude
create_llm_adapter(use_case="interview") now always returns ClaudeCodeAdapter. Reasonable for quality, but Gemini-only users (no Anthropic API key / no Max Plan) won't be able to run interviews at all. Worth documenting this requirement explicitly.

Bot review assessment

The ouroboros-agent bot flagged largely the same issues. Findings #1 and #2 are real blockers. The rest are valid observations but appropriate to defer for a beta release. The bot was thorough — not unfair, just not triaging by release phase.

Nice work on the 35 tests and thorough docs. Looking forward to the next iteration! 🚀

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 431e72b | Triggered by: backlog review sweep

Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: No checks reported

Issue #N/A Requirements

Requirement	Status
No linked issue detected in PR body.	See code findings below for file:line evidence.

Previous Review Follow-up

Previous Finding	Status
Prior bot review exists	Follow-up checked in current analysis.

Code Findings

#	File:Line	Severity	Confidence	Finding
1	src/ouroboros/providers/gemini_cli_adapter.py:41	High	High	`GeminiCliLLMAdapter._build_command()` ignores `output_schema_path`, so every `response_format={"type":"json_schema",...}` request is silently downgraded to an unconstrained text completion. That breaks existing QA / semantic / consensus callers which parse the reply as JSON and currently depend on schema-constrained output.
2	src/ouroboros/orchestrator/gemini_cli_runtime.py:103	High	Medium	The Gemini runtime only overrides session-id extraction, but inherits Codex-only event normalization. The parent runtime converts `thread.started`, `item.completed`, and `turn.failed`; Gemini `-o stream-json` events are unlikely to match that shape, so assistant/tool streaming and explicit failure events will be dropped instead of surfaced to the orchestrator.
3	src/ouroboros/config/loader.py:564	Medium	High	`_default_model_for_backend()` treats Gemini as a “Codex-like” backend and rewrites default model names to `"default"`. For consensus this collapses the default 3-model roster into `("default","default","default")`, which defeats `min_models`/`diversity_required` semantics and turns stage-3 consensus into repeated votes from the same local default model.
4	src/ouroboros/providers/gemini_cli_adapter.py:68	Medium	Medium	The LLM adapter only recognizes `session_id`, while the runtime already had to support both `session_id` and `sessionId`. If Gemini emits camelCase here as well, one-shot completions will lose session tracking in `raw_response`, making resume/debug metadata inconsistent across the two Gemini code paths.
5	docs/runtime-guides/gemini.md:73	Medium	High	The Gemini guide documents `gemini_permission_mode: sandbox` / `auto_edit` and matching env/CLI examples, but the actual accepted values are `default`, `acceptEdits`, and `bypassPermissions` (`OrchestratorConfig` and `resolve_gemini_permission_mode`). Following the guide produces validation/runtime errors on first use.
6	docs/config-reference.md:79	Low	High	The config reference says `runtime_backend: opencode` raises `NotImplementedError`, but the factory currently rejects it with `ValueError` during backend resolution. The docs are advertising a behavior and support level that the implementation does not provide.

Test Coverage

Missing tests for env/config precedence and backend-specific branches in src/ouroboros/config/loader.py, especially Gemini-specific permission selection and the "default" model normalization path.
Missing Gemini structured-output tests in src/ouroboros/providers/gemini_cli_adapter.py that exercise CompletionConfig.response_format the same way src/ouroboros/evaluation/consensus.py and src/ouroboros/mcp/tools/qa.py use it.
Missing event-shape normalization tests in src/ouroboros/orchestrator/gemini_cli_runtime.py using representative Gemini stream-json payloads; current tests only validate command construction and session-id extraction.
Missing regression coverage for consensus model defaults under llm.backend=gemini across src/ouroboros/config/loader.py and src/ouroboros/evaluation/consensus.py.
I could not run the Python test suite here because pytest is not installed in this environment.

Design

The general direction is sound: adding Gemini through the existing runtime/provider factories keeps the public surface relatively clean. The problem is that the implementation reuses Codex base classes almost verbatim while only adapting flags and path lookup. That is not enough when the transport protocol differs. The current design assumes Gemini emits Codex-compatible runtime and completion events, plus equivalent schema/output features, but the code does not prove or enforce that.

The config/docs layer is also ahead of the implementation. New options are exposed broadly, including unsupported opencode values and Gemini permission examples that do not match the actual validators. No previous review findings were supplied, so there was nothing to verify as fixed or still open from an earlier round.

Files Reviewed

docs/config-reference.md
docs/runtime-capability-matrix.md
docs/runtime-guides/gemini.md
src/ouroboros/config/__init__.py
src/ouroboros/config/loader.py
src/ouroboros/config/models.py
src/ouroboros/gemini_permissions.py
src/ouroboros/orchestrator/__init__.py
src/ouroboros/orchestrator/gemini_cli_runtime.py
src/ouroboros/orchestrator/runtime_factory.py
src/ouroboros/providers/__init__.py
src/ouroboros/providers/factory.py
src/ouroboros/providers/gemini_cli_adapter.py
tests/unit/config/test_models.py
tests/unit/orchestrator/test_gemini_cli_runtime.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00 · 2026-03-23T19:20:41Z

⚠️ Base branch rebased onto main

release/0.26.0-beta has been rebased onto the latest main (ef54b9b). This brings in recent main fixes including PR #180 (delegated MCP tool context) and build improvements.

Your branch will need a rebase:

git fetch origin
git rebase origin/release/0.26.0-beta

Key structural changes to be aware of:

AgentRuntime protocol now has runtime_backend, working_directory, permission_mode properties
RuntimeHandle.backend is now normalized via __post_init__
Handler files split from definitions.py into separate modules

Let me know if you need help with the rebase.

Q00 and others added 30 commits March 20, 2026 18:34

style: format definitions.py with ruff

f84faf2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: fix ruff lint (C408 dict literal) and format issues

2f0b41b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add timeout and env vars to MCP config in setup

7cbbaad

Both _setup_claude and _setup_codex now write timeout: 600 and OUROBOROS_AGENT_RUNTIME / OUROBOROS_LLM_BACKEND env vars into mcp.json. Existing entries are backfilled on re-run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix parallel executor cwd prompt context

c74f207

feat: add interview breadth and closure personas (Q00#136)

34b1a52

docs: clarify agent prompt source of truth

2cefb9b

refactor: make packaged agents the source of truth (Q00#136)

3e05b84

Fix session reconstruction and codex schema handling

d28b796

fix: remove unused pytest import in test_json_utils

19805c8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Q00 and others added 22 commits March 20, 2026 18:34

style: format test_codex_cli_runtime.py with ruff

453cee2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: format parallel_executor.py with ruff

8e3085c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: format mcp.py and runtime_factory.py with ruff

5ca0227

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: remove remaining ghost CLI commands (evolve, ralph)

28940bd

These commands are Claude Code skills only, not standalone CLI commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: release v0.26.0b1

e3b1439

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ouroboros-agent bot requested changes Mar 20, 2026

View reviewed changes

Q00 reviewed Mar 21, 2026

View reviewed changes

ouroboros-agent bot requested changes Mar 23, 2026

View reviewed changes

Q00 force-pushed the release/0.26.0-beta branch from a4cefc6 to 73b6b27 Compare March 23, 2026 19:20

Q00 force-pushed the release/0.26.0-beta branch 2 times, most recently from 7ca9a80 to eb9fc80 Compare March 25, 2026 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runtime): add Gemini CLI as third execution runtime#167

feat(runtime): add Gemini CLI as third execution runtime#167
tgmerritt wants to merge 64 commits intoQ00:release/0.26.0-betafrom
tgmerritt:feat/gemini-runtime

tgmerritt commented Mar 20, 2026

Uh oh!

ouroboros-agent bot left a comment

Uh oh!

ouroboros-agent bot left a comment

Uh oh!

Q00 left a comment •

edited

Loading

Uh oh!

ouroboros-agent bot left a comment

Uh oh!

Q00 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tgmerritt commented Mar 20, 2026

Summary

New modules

Config changes

Factory changes

Documentation

Test plan

Uh oh!

ouroboros-agent bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Issue #N/A Requirements

Previous Review Follow-up

Code Findings

Test Coverage

Design

Files Reviewed

Uh oh!

ouroboros-agent bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Issue #N/A Requirements

Previous Review Follow-up

Code Findings

Test Coverage

Design

Files Reviewed

Uh oh!

Q00 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Must fix before merge

Can defer to next iteration

Needs discussion

Bot review assessment

Uh oh!

ouroboros-agent bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Issue #N/A Requirements

Previous Review Follow-up

Code Findings

Test Coverage

Design

Files Reviewed

Uh oh!

Q00 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Q00 left a comment •

edited

Loading