Skip to content

feat(runtime): add Gemini CLI as third execution runtime#167

Open
tgmerritt wants to merge 64 commits intoQ00:release/0.26.0-betafrom
tgmerritt:feat/gemini-runtime
Open

feat(runtime): add Gemini CLI as third execution runtime#167
tgmerritt wants to merge 64 commits intoQ00:release/0.26.0-betafrom
tgmerritt:feat/gemini-runtime

Conversation

@tgmerritt
Copy link
Copy Markdown

Summary

  • Add Google Gemini CLI as a third AgentRuntime alongside Claude Code and Codex CLI
  • Change the default execution runtime from claude to codex
  • Hardcode Claude as the interview backend regardless of configured runtime (create_llm_adapter(use_case="interview") always returns ClaudeCodeAdapter)
  • Support user selection via config.yaml, CLI flag (--runtime gemini), and env var (OUROBOROS_AGENT_RUNTIME=gemini)

New modules

  • gemini_permissions.py — permission mode to CLI flag mapping (--sandbox, --approval-mode, --yolo)
  • providers/gemini_cli_adapter.py — LLM adapter (thin subclass of CodexCliLLMAdapter)
  • orchestrator/gemini_cli_runtime.py — agent runtime (thin subclass of CodexCliRuntime)

Config changes

  • OrchestratorConfig.runtime_backend default: "claude""codex"
  • New fields: gemini_cli_path, gemini_permission_mode
  • New env vars: OUROBOROS_GEMINI_CLI_PATH, OUROBOROS_GEMINI_PERMISSION_MODE

Factory changes

  • create_llm_adapter(use_case="interview") always returns ClaudeCodeAdapter
  • Both runtime and provider factories resolve "gemini" / "gemini_cli" aliases

Documentation

  • New runtime guide: docs/runtime-guides/gemini.md
  • Updated capability matrix with Gemini column
  • Updated config reference with new keys and env vars

Test plan

  • 35 new unit tests for Gemini permissions, adapter, runtime, and factory wiring
  • Updated existing factory and config model tests for new defaults
  • 192 feature-related tests pass (0 failures)
  • ruff check and ruff format pass on all changed files
  • Interview hardcoding verified: create_llm_adapter(backend="gemini", use_case="interview") returns ClaudeCodeAdapter
  • Default change verified: OrchestratorConfig().runtime_backend == "codex"
  • Factory resolution verified: resolve_agent_runtime_backend("gemini_cli") == "gemini"

🤖 Generated with Claude Code

Q00 and others added 30 commits March 20, 2026 18:34
Introduce AgentRuntime Protocol and RuntimeHandle for backend-neutral
runtime management. Add Codex CLI runtime implementation with session
tracking, MCP tool definitions, parallel AC execution with retry/resume,
and comprehensive test coverage (2800+ tests passing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix QA structured output schema for Codex/OpenAI compatibility by
  adding `additionalProperties: false` and all fields to `required`
- Add seed_path support to StartExecuteSeedHandler (previously only
  ExecuteSeedHandler resolved seed_path to seed_content)
- Include Runtime/LLM Backend info in start_execute_seed response
- Add terminal status parametrized tests for session_status handler
- Clean up OpenCode runtime stubs with explicit NotImplementedError
- Add error handling for ValueError/NotImplementedError in CLI run

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI renders help output with ANSI escape codes that split `--runtime`
into separate escape sequences, causing exact string match to fail.
Use case-insensitive keyword matching instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sertions

Rich inserts ANSI escape sequences at hyphen boundaries in CLI help
output (e.g. --llm-backend), causing plain-text assertions to fail.
Setting NO_COLOR=1 in the root conftest.py disables color output for
all tests, fixing the 4 failing CI checks and preventing future
breakage for any hyphenated option names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In CI, GITHUB_ACTIONS env var causes Typer to set force_terminal=True
on Rich Console, emitting ANSI escape codes into CliRunner's string
buffer. This breaks plain-text assertions for hyphenated options like
--llm-backend. Use Typer's built-in _TYPER_FORCE_DISABLE_TERMINAL
escape hatch instead of NO_COLOR (which only disables colors but
leaves bold/dim style codes intact).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…setup

- Move claude-agent-sdk, anthropic, litellm from core deps to optional
  extras ([claude], [litellm], [all]) so Codex-only users can install
  ouroboros-ai without unnecessary SDK dependencies
- Convert eager imports to lazy: LiteLLMAdapter in providers/__init__.py
  and factory.py, litellm in core/context.py (with len//4 fallback)
- Add `ouroboros setup` CLI command with auto-detection of available
  runtimes (claude, codex) and interactive/non-interactive modes
- Add scripts/install.sh one-liner installer with runtime auto-detection
- Update README Quick Start to show 3 parallel install paths:
  Claude Code Plugin / Standalone pip / One-liner
- Update SKILL.md with standalone setup reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The dev group previously included ouroboros-ai[all] which pulled in the
dashboard extra (streamlit → watchdog). watchdog is untyped, and mypy
cannot resolve watchdog.observers.Observer as a valid type on Linux.
Use ouroboros-ai[claude,litellm] instead — dev needs runtime deps for
testing but not dashboard visualization deps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security:
- Add InputValidator.validate_llm_response() to CodexCliLLMAdapter (parity with other adapters)
- Pass prompt via stdin instead of CLI argument to avoid ARG_MAX limits
- Add await stdin.drain() before close to ensure flush on large prompts
- Remove _extract_text() recursive fallback to prevent data leakage via error messages
- Add asyncio.timeout to legacy process.communicate() fallback path
- Validate resume_session_id with regex pattern to prevent CLI argument injection

Reliability:
- Guard _cancellation_registry with asyncio.Lock for concurrent access safety
- Add terminal state check before mark_cancelled to prevent race condition
- Add _max_resume_retries=3 depth limit to prevent infinite execute_task recursion
- Add 50MB buffer limit to _iter_stream_lines() with incremental byte tracking
- Fix EventStore connection leak in ExecuteSeedHandler background task
- Guard None sentinels in parallel_executor level_results

Quality:
- Change interview permission mode from acceptEdits to default for codex/opencode
- Remove 28 lines of unreachable dead code in _build_runtime_handle
- Add warning log for silently discarded non-string session_id
- Cache derive_runtime_signal results to reduce redundant calls (3x → 2x)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restructure all documentation, README, and branding from Claude
Code-centric plugin to runtime-agnostic workflow engine supporting
both Claude Code and Codex CLI as equal first-class runtime backends.

Key changes:
- README restructured as conversion page (problem-solver positioning)
- Quick Start with runtime tabs: Claude Code | Codex CLI | Standalone
- New runtime guides: docs/runtime-guides/claude-code.md, codex.md
- Runtime capability matrix comparing backends side-by-side
- Architecture docs updated with runtime abstraction layer
- CLI reference updated for setup, --runtime, --non-interactive
- Platform support matrix (Windows experimental/WSL recommended)
- SECURITY.md with standard vulnerability reporting policy
- Python version corrected to >=3.12 everywhere (was 3.14+)
- All "Claude Code plugin" references replaced with agnostic language
- Legacy docs/running-with-claude-code.md preserved as redirect stub
- Codex ooo skill support documented (rules + skills install)
- Config value corrected: runtime_backend: claude (not claude-code)
- Stale "Claude Agent SDK" references updated in guides
- Install commands match pyproject.toml exactly
- Demo image placeholders added for interview/seed/evaluation
- Sub-tagline: "Specification-first workflow engine for AI coding agents"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- S1: path containment for absolute/relative paths in security.py
- Q1a: complete handler re-export in mcp/tools/__init__.py
- A1: remove getattr fallback from definitions.py
- Handler split: definitions.py → per-domain handler modules
- Fix non-deterministic updated_at timestamp flake in parallel executor test
- Add runtime_backend/working_directory/permission_mode to all test stubs
- Deep-clone consistency for RuntimeHandle metadata
- Ruff format cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both _setup_claude and _setup_codex now write timeout: 600 and
OUROBOROS_AGENT_RUNTIME / OUROBOROS_LLM_BACKEND env vars into
mcp.json. Existing entries are backfilled on re-run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every mcp.json example/template now includes timeout: 600 and
OUROBOROS_AGENT_RUNTIME env var. Fixes docs/cli-reference.md,
docs/guides/cli-usage.md, docs/guides/common-workflows.md,
and .claude-plugin/.mcp.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OUROBOROS_AGENT_RUNTIME in mcp.json env would override config.yaml
(env > config priority), making runtime changes via config.yaml
silently ineffective. Runtime selection belongs in config.yaml only,
which setup already writes correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mcp.json handles MCP server registration (timeout only).
Runtime backend is configured in ~/.ouroboros/config.yaml, with
optional OUROBOROS_AGENT_RUNTIME env var override for power users.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
extract_json_payload() now tries each { position via brace-counting
and validates with json.loads, instead of only trying the first {.
This fixes 75% QA verdict parse failures caused by Anthropic's
prefill workaround producing prose with stray braces before JSON.

Also adds llms-full.txt with deep model-facing reference content
and bolsters the Secondary Loop section with TODO registry and
batch scheduler details.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate getting-started.md as the single onboarding SSOT, remove
duplicated guides (cli-usage, common-workflows, language-support,
quick-start), delete stale API design docs and ontological-framework
directory, and trim verbose sections across architecture, cli-reference,
and runtime guides. README retains philosophy sections, TUI section
moved to docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add self-answering interview mode, improve codex CLI adapter error
handling, update provider factory runtime detection, and expand MCP
authoring handler coverage with corresponding tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update .gitignore, .mcp.json timeout config, expand CONTRIBUTING.md
with dev workflow details, refresh skill definitions for interview/
evolve/setup, and sync socratic-interviewer agent spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Provide llms.txt as a concise index and llms-full.txt as a detailed
reference, following the Context7 convention so AI coding agents can
ingest project context efficiently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…inel

When ouroboros spawns a runtime (Codex/Claude/OpenCode), the child
process may read its own MCP config and spawn another ouroboros server,
causing exponential process tree growth (34+ processes observed).

The sentinel env var is set on first serve() entry and inherited by all
child processes, causing nested instances to exit(0) immediately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… docs

- Replace "Fallback" with "Alternative" for non-Claude runtime paths
- Change "CLI fallback" to "CLI equivalent" throughout getting-started
- Trim verbose metadata blocks from cli-reference, codex, config-reference
- Create docs/guides/evolution-loop.md: Ralph, Wonder/Reflect, convergence
- Commit pending docs: config-reference, evaluation-pipeline, findings-registry
- Update docs/README.md index: add evolution guide, remove broken links

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix ruff format in 17 files (session.py, mcp.py, test files, scripts)
- Remove unused imports (pytest in test_json_utils)
- Fix StrEnum inheritance (examples/task_manager)
- Fix unused vars and args in scripts (doc_volatility, migrate_authority,
  semantic_link_rot_check)
- Delete leftover playground/src/ files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Q00 and others added 22 commits March 20, 2026 18:34
Three related bugs caused the AC execution tree to show only the root
"Seed" node with no children:

1. _notify_ac_tree_updated() only updated the active screen — when
   events arrived while session selector was active, the dashboard
   tree was never refreshed. Now uses get_screen() to reach the
   installed dashboard regardless of which screen is active.

2. DashboardScreenV3 lacked an on_show() hook — when switching back
   to the dashboard, the tree was stuck with its initial empty state.
   Now refreshes from _state.ac_tree on every show.

3. parallel_executor used 0-based AC index while WorkflowStateTracker
   uses 1-based (as documented in AcceptanceCriterion.index). Fixed
   to i+1 for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A. Fix subtask events using 0-based ac_index that couldn't match
   1-based tree node keys — subtasks now attach to parent nodes.

B. Replace all self.screen forwarding in app.py with
   _forward_to_dashboard() helper that reaches the installed
   dashboard via get_screen(), preventing message drops when
   session selector or other screens are active.

D. Wrap _execute_parallel() call in try/except to persist
   session.failed events on unhandled exceptions, preventing
   0-event ghost sessions.

E. Expand on_show() to refresh phase_bar and activity_bar in
   addition to AC tree when dashboard becomes active.

F. Remove dead DashboardScreen import and dashboard_v2 references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hook

Allows runtime subclasses to control how prompts are delivered:
- _build_command() now accepts an optional prompt kwarg (ignored by
  Codex CLI which uses stdin)
- _feeds_prompt_via_stdin() returns True by default; subclasses can
  override to False to skip stdin prompt delivery
- _execute_task_impl() passes composed_prompt to _build_command()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add heartbeat-based alive check to orphan detection so sessions
  with active runtime processes are not cancelled on MCP restart
- Enable SQLite WAL mode and busy_timeout=30s for concurrent access
- Add retry logic (3 attempts) to event_store.append() for transient
  "database is locked" errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix event_store.py: move logger after imports, prefix unused arg
  with underscore, remove unused last_err variable, fix import order
- Fix heartbeat.py: ProcessNotFoundError → ProcessLookupError
  (correct Python built-in exception name)
- Apply ruff format to both files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The heartbeat integration in find_orphaned_sessions() checks real
lock files, causing test pollution. Add autouse fixture to mock
get_alive_sessions() with an empty set in both
TestFindOrphanedSessions and TestCancelOrphanedSessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _setup_claude() now persists runtime_backend, llm.backend, and
  claude_path to config.yaml (matching _setup_codex() behavior)
- start_execute_seed_handler() now accepts and propagates
  runtime_backend/llm_backend to the inner ExecuteSeedHandler
- Add tests for setup config persistence and backend propagation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tate guard

- _build_tool_arguments() now preserves original mcp_args (initial_context,
  cwd, etc.) and overlays session_id/answer, instead of rebuilding from scratch
- StartExecuteSeedHandler now checks terminal session status (completed,
  cancelled, failed) before enqueueing, matching ExecuteSeedHandler behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply the same fix from command_dispatcher.py to codex_cli_runtime.py's
_build_tool_arguments() — preserve original mcp_args and overlay
session_id/answer instead of rebuilding from scratch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- setup.py: write `cli_path` instead of `claude_path` so the config
  loader actually picks up the detected Claude binary path.
- execution_handlers.py: when seed_path does not exist on disk, fall
  back to treating the value as inline YAML instead of returning an
  error, matching the documented tool contract for both
  ouroboros_execute_seed and ouroboros_start_execute_seed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- getting-started.md: remove nonexistent `ouroboros interview` command,
  clarify that interview is available via `ooo` or MCP tools only,
  add required seed_file arg to `ouroboros run` examples
- architecture.md: fix interview entrypoint references
- README.md: remove Codex from `ooo` usage note (not yet supported)
- cli-reference.md: replace opencode manual config suggestion with
  "not yet implemented" warning
- config-reference.md: add "not yet implemented" caveat to opencode
  settings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ementedError

Address ouroboros-agent review findings:
- Remove OPENCODE enum values from CLI parsers (init, mcp, run)
- Reject opencode at resolve_*_backend() with early ValueError
- Replace opencode normalization tests with boundary rejection tests
- Ensure legacy subprocess fallback restores schema transforms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Convert opencode server creation test to assert ValueError rejection
- Convert opencode execution handler test to assert MCPToolError on reject
- Switch resume test from opencode to codex (tests resume path, not backend)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README: fix `ouroboros run workflow` → `ouroboros run seed.yaml`
- getting-started: add required seed path to --resume examples
- CONTRIBUTING: replace dead `docs/guides/cli-usage.md` refs with `docs/getting-started.md`
- codex/ouroboros.md: align setup/update descriptions with actual CLI behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oints

- README: rewrite commands table to show skill vs CLI equivalents
- seed-authoring: replace all `ouroboros interview *` with `ouroboros init start *`
- Clarify that some skills (evaluate, evolve, etc.) are MCP/skill-only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These commands are Claude Code skills only, not standalone CLI commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: fix 16 audit findings from agent team review

Critical fixes:
- getting-started.md: Correct interview command info (ouroboros init start exists)
- README.md: Fix ouroboros status requires subcommand, add cancel to table
- getting-started.md: Remove overstated Claude/Codex parity claim

High fixes:
- README.md: Add install.sh one-liner to Standalone quick-start
- cli-reference.md: Fix TUI backend option (python, not textual)
- CONTRIBUTING.md: Remove broken docs/api/parallel-execution.md reference
- findings-registry.md: Mark entity-registry migration as planned-not-created
- codex.md: Clarify status command syntax

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix remaining entity-registry broken references in findings-registry

Clean up frontmatter description, schema changelog, backward-compat rule,
and record_type field description that still referenced non-existent
entity-registry.yaml and migration guide files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: mark FIND-044 resolved, fix open findings count

Update findings-registry to reflect codex.md status command fix:
- FIND-044 status: open → resolved (both YAML and summary table)
- Remove FIND-044 from open findings list
- Replace detail section with resolution note

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: resolve FIND-045 and FIND-050, update registry

- FIND-045: Add credentials.yaml cross-links to claude-code.md and codex.md
- FIND-050: Already fixed in codex.md:104 (parenthetical note); mark resolved
- Update open findings list: only FIND-018, FIND-019 remain (structural)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: sync registry stats and fix README claude-code link wording

- Update YAML stats: open 5→2, resolved 45→48
- Update summary table: medium open 3→0, total open 5→2
- README: change "full details" to "backend configuration and CLI options"
  to accurately describe what claude-code.md covers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Google Gemini CLI as a third AgentRuntime alongside Claude Code and
Codex CLI. Change the default execution runtime to Codex and hardcode
Claude as the interview backend regardless of configured runtime.

New modules:
- gemini_permissions.py: permission mode -> CLI flag mapping
- providers/gemini_cli_adapter.py: LLM adapter (subclasses CodexCliLLMAdapter)
- orchestrator/gemini_cli_runtime.py: agent runtime (subclasses CodexCliRuntime)

Config changes:
- OrchestratorConfig.runtime_backend default: "claude" -> "codex"
- New fields: gemini_cli_path, gemini_permission_mode
- New Literal value: "gemini" in runtime_backend and llm.backend
- New env vars: OUROBOROS_GEMINI_CLI_PATH, OUROBOROS_GEMINI_PERMISSION_MODE

Factory changes:
- create_llm_adapter(use_case="interview") always returns ClaudeCodeAdapter
- Both factories resolve "gemini"/"gemini_cli" aliases

Includes 35 new unit tests and updated documentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 431e72b | Triggered by: PR opened

Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: unknown

Issue #N/A Requirements

Requirement Status
No linked issue detected in PR body N/A — No issue requirements to map.

Previous Review Follow-up

Previous Finding Status
First review — no previous findings. N/A — First bot review on this PR.

Code Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/providers/gemini_cli_adapter.py:41 High High GeminiCliLLMAdapter._build_command() ignores output_schema_path, so every response_format={"type":"json_schema",...} request is silently downgraded to an unconstrained text completion. That breaks existing QA / semantic / consensus callers which parse the reply as JSON and currently depend on schema-constrained output.
2 src/ouroboros/orchestrator/gemini_cli_runtime.py:103 High Medium The Gemini runtime only overrides session-id extraction, but inherits Codex-only event normalization. The parent runtime converts thread.started, item.completed, and turn.failed; Gemini -o stream-json events are unlikely to match that shape, so assistant/tool streaming and explicit failure events will be dropped instead of surfaced to the orchestrator.
3 src/ouroboros/config/loader.py:564 Medium High _default_model_for_backend() treats Gemini as a “Codex-like” backend and rewrites default model names to "default". For consensus this collapses the default 3-model roster into ("default","default","default"), which defeats min_models/diversity_required semantics and turns stage-3 consensus into repeated votes from the same local default model.
4 src/ouroboros/providers/gemini_cli_adapter.py:68 Medium Medium The LLM adapter only recognizes session_id, while the runtime already had to support both session_id and sessionId. If Gemini emits camelCase here as well, one-shot completions will lose session tracking in raw_response, making resume/debug metadata inconsistent across the two Gemini code paths.
5 docs/runtime-guides/gemini.md:73 Medium High The Gemini guide documents gemini_permission_mode: sandbox / auto_edit and matching env/CLI examples, but the actual accepted values are default, acceptEdits, and bypassPermissions (OrchestratorConfig and resolve_gemini_permission_mode). Following the guide produces validation/runtime errors on first use.
6 docs/config-reference.md:79 Low High The config reference says runtime_backend: opencode raises NotImplementedError, but the factory currently rejects it with ValueError during backend resolution. The docs are advertising a behavior and support level that the implementation does not provide.

Test Coverage

Missing tests for env/config precedence and backend-specific branches in src/ouroboros/config/loader.py, especially Gemini-specific permission selection and the "default" model normalization path.
Missing Gemini structured-output tests in src/ouroboros/providers/gemini_cli_adapter.py that exercise CompletionConfig.response_format the same way src/ouroboros/evaluation/consensus.py and src/ouroboros/mcp/tools/qa.py use it.
Missing event-shape normalization tests in src/ouroboros/orchestrator/gemini_cli_runtime.py using representative Gemini stream-json payloads; current tests only validate command construction and session-id extraction.
Missing regression coverage for consensus model defaults under llm.backend=gemini across src/ouroboros/config/loader.py and src/ouroboros/evaluation/consensus.py.
I could not run the Python test suite here because pytest is not installed in this environment.

Design

The general direction is sound: adding Gemini through the existing runtime/provider factories keeps the public surface relatively clean. The problem is that the implementation reuses Codex base classes almost verbatim while only adapting flags and path lookup. That is not enough when the transport protocol differs. The current design assumes Gemini emits Codex-compatible runtime and completion events, plus equivalent schema/output features, but the code does not prove or enforce that.

The config/docs layer is also ahead of the implementation. New options are exposed broadly, including unsupported opencode values and Gemini permission examples that do not match the actual validators. No previous review findings were supplied, so there was nothing to verify as fixed or still open from an earlier round.

Files Reviewed

  • docs/config-reference.md
  • docs/runtime-capability-matrix.md
  • docs/runtime-guides/gemini.md
  • src/ouroboros/config/__init__.py
  • src/ouroboros/config/loader.py
  • src/ouroboros/config/models.py
  • src/ouroboros/gemini_permissions.py
  • src/ouroboros/orchestrator/__init__.py
  • src/ouroboros/orchestrator/gemini_cli_runtime.py
  • src/ouroboros/orchestrator/runtime_factory.py
  • src/ouroboros/providers/__init__.py
  • src/ouroboros/providers/factory.py
  • src/ouroboros/providers/gemini_cli_adapter.py
  • tests/unit/config/test_models.py
  • tests/unit/orchestrator/test_gemini_cli_runtime.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 431e72b | Triggered by: PR opened

Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: no checks reported on the 'feat/gemini-runtime' branch

Issue #N/A Requirements

Requirement Status
No linked issue detected in PR body N/A — No issue requirements to map.

Previous Review Follow-up

Previous Finding Status
First review — no previous findings. N/A — First bot review on this PR.

Code Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/providers/gemini_cli_adapter.py:41 high high GeminiCliLLMAdapter._build_command() accepts output_schema_path but never uses it. The parent completion flow still builds a schema file and expects the backend command to enforce it, so any response_format={"type": "json_schema", ...} request sent through the Gemini adapter silently degrades to unconstrained free-form output. That is a behavioral regression relative to the existing Codex adapter contract.
2 docs/runtime-guides/gemini.md:73 medium high The new guide tells users to set gemini_permission_mode: sandbox / auto_edit, but the implementation only accepts default, acceptEdits, and bypassPermissions (src/ouroboros/gemini_permissions.py:14). Copy-pasting the documented config will fail validation before the runtime even starts.
3 docs/runtime-guides/gemini.md:220 medium high The CLI examples repeat the same invalid permission values (--permission-mode sandbox and --permission-mode auto_edit). The runtime validators and config model only recognize default, acceptEdits, and bypassPermissions, so the documented commands are not runnable as written.
4 src/ouroboros/providers/gemini_cli_adapter.py:78 medium medium The provider-side session-id extraction only recognizes session_id, while the runtime-side Gemini parser was already extended to accept both session_id and sessionId (src/ouroboros/orchestrator/gemini_cli_runtime.py:103). If the Gemini CLI emits camelCase on the one-shot path too, completion metadata/session tracking will be lost there even though the runtime path handles it.
5 src/ouroboros/config/loader.py:531 medium medium create_llm_adapter(backend="gemini") resolves its permission mode through get_llm_permission_mode() (src/ouroboros/providers/factory.py:98), but that loader only has backend-specific branches for OpenCode. The new Gemini backend therefore has no Gemini-specific permission override path on the LLM-only side, which makes the runtime and provider surfaces inconsistent and prevents a dedicated Gemini override from ever being honored there.

Test Coverage

The current test suite covers constructor wiring and flag mapping, but it does not exercise the two places most likely to break in real use:

  • No provider test requests a structured response format through GeminiCliLLMAdapter, so the missing output_schema_path handling is not caught (tests/unit/providers/test_gemini_cli_adapter.py).
  • No test asserts that the documentation examples use validator-accepted permission strings, so the sandbox / auto_edit drift shipped unchecked (tests/unit/test_gemini_permissions.py, docs only).
  • No provider test covers sessionId (camelCase) on the one-shot Gemini path, even though the runtime path explicitly anticipates that variant (tests/unit/providers/test_gemini_cli_adapter.py, tests/unit/orchestrator/test_gemini_cli_runtime.py).

Design

The overall shape is sensible: keep Gemini as a thin Codex-derived runtime/provider pair, centralize permission translation in gemini_permissions.py, and wire backend selection through the existing factories. That preserves the handler/runtime layering rather than forking new orchestration logic.

The problems are mostly contract mismatches at the edges:

  • the provider adapter does not yet preserve the parent structured-output contract,
  • the docs/examples are out of sync with the actual validator surface,
  • and the runtime/provider config surfaces are not aligned on Gemini-specific permission handling.

I would fix those boundary mismatches before merging; otherwise the first user-facing Gemini paths are likely to fail in ways that look like product instability rather than normal beta roughness.

Files Reviewed

  • docs/config-reference.md
  • docs/runtime-capability-matrix.md
  • docs/runtime-guides/gemini.md
  • src/ouroboros/config/__init__.py
  • src/ouroboros/config/loader.py
  • src/ouroboros/config/models.py
  • src/ouroboros/gemini_permissions.py
  • src/ouroboros/orchestrator/__init__.py
  • src/ouroboros/orchestrator/gemini_cli_runtime.py
  • src/ouroboros/orchestrator/runtime_factory.py
  • src/ouroboros/providers/__init__.py
  • src/ouroboros/providers/factory.py
  • src/ouroboros/providers/gemini_cli_adapter.py
  • tests/unit/config/test_models.py
  • tests/unit/orchestrator/test_gemini_cli_runtime.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Copy link
Copy Markdown
Owner

@Q00 Q00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @tgmerritt! The overall shape is solid — thin Codex-derived subclasses with centralized permission translation is exactly the right pattern. A few things to address before merge, and some that can wait.


Must fix before merge

1. output_schema_path silently dropped (High)
GeminiCliLLMAdapter._build_command() accepts output_schema_path but never passes it to the CLI. The parent _complete_once() builds a schema temp file and expects the backend to enforce it, so any response_format={"type": "json_schema", ...} request (used by QA, consensus, semantic callers) silently degrades to unconstrained text output. This will cause JSON parse failures downstream.

Either wire --output-schema through (if Gemini CLI supports it) or explicitly raise/warn when structured output is requested but unsupported.

2. Docs use invalid permission values (Medium-High)
docs/runtime-guides/gemini.md documents sandbox and auto_edit as config values (lines 73, 220+), but the actual validator (resolve_gemini_permission_mode) only accepts default, acceptEdits, bypassPermissions. Users following the guide will hit a validation error on first use. The permission table and CLI examples both need updating.


Can defer to next iteration

3. Event normalization assumes Codex event shape — The Gemini runtime inherits Codex event parsing (thread.started, item.completed, turn.failed). Gemini's stream-json output is unlikely to match. This is fine for beta since the actual Gemini CLI event format needs empirical validation first. Suggest opening a follow-up issue to test with real Gemini output and adapt the normalizer.

4. sessionId camelCase inconsistency — Runtime handles both session_id and sessionId, but the provider-side adapter only checks session_id. Low priority until we confirm what Gemini actually emits.

5. Consensus model defaults — Gemini is grouped with Codex in _CODEX_LLM_BACKENDS, so _default_model_for_backend() rewrites consensus models to ("default", "default", "default"), defeating diversity. Only matters when OpenRouter isn't configured. Low priority for beta.


Needs discussion

6. Breaking change: default runtime claudecodex
OrchestratorConfig.runtime_backend default changed from "claude" to "codex". This silently changes behavior for every user without an explicit config. This should either:

  • Be reverted and handled in a separate PR with proper changelog/migration note, or
  • At minimum be called out as a breaking change in the PR description and release notes

7. Interview hardcoded to Claude
create_llm_adapter(use_case="interview") now always returns ClaudeCodeAdapter. Reasonable for quality, but Gemini-only users (no Anthropic API key / no Max Plan) won't be able to run interviews at all. Worth documenting this requirement explicitly.


Bot review assessment

The ouroboros-agent bot flagged largely the same issues. Findings #1 and #2 are real blockers. The rest are valid observations but appropriate to defer for a beta release. The bot was thorough — not unfair, just not triaging by release phase.

Nice work on the 35 tests and thorough docs. Looking forward to the next iteration! 🚀

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 431e72b | Triggered by: backlog review sweep

Branch: feat/gemini-runtime | 19 files, +1117/-59 | CI: No checks reported

Issue #N/A Requirements

Requirement Status
No linked issue detected in PR body. See code findings below for file:line evidence.

Previous Review Follow-up

Previous Finding Status
Prior bot review exists Follow-up checked in current analysis.

Code Findings

# File:Line Severity Confidence Finding
1 src/ouroboros/providers/gemini_cli_adapter.py:41 High High GeminiCliLLMAdapter._build_command() ignores output_schema_path, so every response_format={"type":"json_schema",...} request is silently downgraded to an unconstrained text completion. That breaks existing QA / semantic / consensus callers which parse the reply as JSON and currently depend on schema-constrained output.
2 src/ouroboros/orchestrator/gemini_cli_runtime.py:103 High Medium The Gemini runtime only overrides session-id extraction, but inherits Codex-only event normalization. The parent runtime converts thread.started, item.completed, and turn.failed; Gemini -o stream-json events are unlikely to match that shape, so assistant/tool streaming and explicit failure events will be dropped instead of surfaced to the orchestrator.
3 src/ouroboros/config/loader.py:564 Medium High _default_model_for_backend() treats Gemini as a “Codex-like” backend and rewrites default model names to "default". For consensus this collapses the default 3-model roster into ("default","default","default"), which defeats min_models/diversity_required semantics and turns stage-3 consensus into repeated votes from the same local default model.
4 src/ouroboros/providers/gemini_cli_adapter.py:68 Medium Medium The LLM adapter only recognizes session_id, while the runtime already had to support both session_id and sessionId. If Gemini emits camelCase here as well, one-shot completions will lose session tracking in raw_response, making resume/debug metadata inconsistent across the two Gemini code paths.
5 docs/runtime-guides/gemini.md:73 Medium High The Gemini guide documents gemini_permission_mode: sandbox / auto_edit and matching env/CLI examples, but the actual accepted values are default, acceptEdits, and bypassPermissions (OrchestratorConfig and resolve_gemini_permission_mode). Following the guide produces validation/runtime errors on first use.
6 docs/config-reference.md:79 Low High The config reference says runtime_backend: opencode raises NotImplementedError, but the factory currently rejects it with ValueError during backend resolution. The docs are advertising a behavior and support level that the implementation does not provide.

Test Coverage

Missing tests for env/config precedence and backend-specific branches in src/ouroboros/config/loader.py, especially Gemini-specific permission selection and the "default" model normalization path.
Missing Gemini structured-output tests in src/ouroboros/providers/gemini_cli_adapter.py that exercise CompletionConfig.response_format the same way src/ouroboros/evaluation/consensus.py and src/ouroboros/mcp/tools/qa.py use it.
Missing event-shape normalization tests in src/ouroboros/orchestrator/gemini_cli_runtime.py using representative Gemini stream-json payloads; current tests only validate command construction and session-id extraction.
Missing regression coverage for consensus model defaults under llm.backend=gemini across src/ouroboros/config/loader.py and src/ouroboros/evaluation/consensus.py.
I could not run the Python test suite here because pytest is not installed in this environment.

Design

The general direction is sound: adding Gemini through the existing runtime/provider factories keeps the public surface relatively clean. The problem is that the implementation reuses Codex base classes almost verbatim while only adapting flags and path lookup. That is not enough when the transport protocol differs. The current design assumes Gemini emits Codex-compatible runtime and completion events, plus equivalent schema/output features, but the code does not prove or enforce that.

The config/docs layer is also ahead of the implementation. New options are exposed broadly, including unsupported opencode values and Gemini permission examples that do not match the actual validators. No previous review findings were supplied, so there was nothing to verify as fixed or still open from an earlier round.

Files Reviewed

  • docs/config-reference.md
  • docs/runtime-capability-matrix.md
  • docs/runtime-guides/gemini.md
  • src/ouroboros/config/__init__.py
  • src/ouroboros/config/loader.py
  • src/ouroboros/config/models.py
  • src/ouroboros/gemini_permissions.py
  • src/ouroboros/orchestrator/__init__.py
  • src/ouroboros/orchestrator/gemini_cli_runtime.py
  • src/ouroboros/orchestrator/runtime_factory.py
  • src/ouroboros/providers/__init__.py
  • src/ouroboros/providers/factory.py
  • src/ouroboros/providers/gemini_cli_adapter.py
  • tests/unit/config/test_models.py
  • tests/unit/orchestrator/test_gemini_cli_runtime.py

Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 force-pushed the release/0.26.0-beta branch from a4cefc6 to 73b6b27 Compare March 23, 2026 19:20
@Q00
Copy link
Copy Markdown
Owner

Q00 commented Mar 23, 2026

⚠️ Base branch rebased onto main

release/0.26.0-beta has been rebased onto the latest main (ef54b9b). This brings in recent main fixes including PR #180 (delegated MCP tool context) and build improvements.

Your branch will need a rebase:

git fetch origin
git rebase origin/release/0.26.0-beta

Key structural changes to be aware of:

  • AgentRuntime protocol now has runtime_backend, working_directory, permission_mode properties
  • RuntimeHandle.backend is now normalized via __post_init__
  • Handler files split from definitions.py into separate modules

Let me know if you need help with the rebase.

@Q00 Q00 force-pushed the release/0.26.0-beta branch 2 times, most recently from 7ca9a80 to eb9fc80 Compare March 25, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants