Skip to content

Multi-Agent Support for Ralph #22

@rjernst

Description

@rjernst

branch: ralph-multi-agent

Spec: Multi-Agent Support for Ralph

Overview

Ralph is currently hardcoded to use Claude Code as its AI coding agent. This spec adds support for Cursor Agent CLI (cursor-agent) as an alternative backend, selectable via --agent cursor. The change introduces:

  • An --agent <name> CLI flag (default: claude)
  • A single Docker image with both agent CLIs installed
  • Agent-specific CLI invocation in the container entrypoint
  • Docker volume-based credential storage (no auth tokens as env vars or on host disk)
  • Read-only host config mounts for agent rules/settings (symlinked into the agent's config dir)
  • Auto-detection of auth failures with re-prompt

Architecture

ralph --agent cursor --issue 42
  │
  ├─ 1. Auth check (Docker volume "ralph-auth")
  │   └─ Volume path: /home/ralph/.<agent>/
  │   └─ If missing credential:
  │       ├─ claude (macOS): extract from Keychain, pipe into volume
  │       ├─ claude (Linux): extract from ~/.claude/.credentials.json, pipe into volume
  │       ├─ claude (no creds): run `claude setup-token` on host, then extract
  │       └─ cursor (any): run `cursor-agent login` on host, then extract
  │
  ├─ 2. Docker image (single image, both CLIs)
  │   └─ ralph:uid-<UID>  (claude + cursor-agent both installed)
  │
  ├─ 3. Container mounts
  │   ├─ -v <worktree>:/work                          (source code)
  │   ├─ -v ralph-auth:/home/ralph/.<agent>/           (credentials, persistent)
  │   ├─ -v ~/.claude/:/home/ralph/.claude-host/:ro    (config/rules, read-only)
  │   ├─ -v ~/.cursor/:/home/ralph/.cursor-host/:ro    (config/rules, read-only)
  │   ├─ -v ~/.gitconfig:/home/ralph/.gitconfig:ro     (existing)
  │   └─ -v ~/.ssh:/home/ralph/.ssh:ro                 (existing)
  │
  ├─ 4. Entrypoint
  │   ├─ Symlink config from .<agent>-host/ into .<agent>/ (skip credential files)
  │   ├─ Git config setup (existing, shared)
  │   ├─ Branch on $AGENT:
  │   │   ├─ claude: claude -p --dangerously-skip-permissions --model $MODEL --reasoning-effort high
  │   │   └─ cursor: cursor-agent -p --force --trust --sandbox disabled --model $MODEL
  │   └─ HEAD tracking + optional push (existing, shared)
  │
  └─ 5. Auth failure recovery
      └─ Capture container output, grep for auth patterns
      └─ If auth failure: clear credential from volume, re-run auth flow, retry

Prompt text is identical for both agents — it's generic enough that both understand the one-task-per-iteration contract.


1. CLI Interface

New flag: --agent <name> (default: claude)

  • Valid values: claude, cursor
  • Invalid agent name → error with exit 2

Model defaults per agent (when --model not specified):

  • claudesonnet
  • cursorsonnet-4 (verify during implementation; cursor model names may differ)

The --model flag overrides the default with pass-through (no name translation).

No ralph auth subcommand — auth is fully automated on demand.

2. Credential Storage

Docker named volume: One volume per agent (ralph-claude-auth, ralph-cursor-auth), mounted at the agent's expected config path inside the container (/home/ralph/.claude/, /home/ralph/.cursor/).

Credentials never appear as:

  • Environment variables in docker run -e
  • Files on the host filesystem (outside Keychain / agent-managed config)

First-run auth flow (Claude):

  1. Check volume: docker run --rm -v ralph-claude-auth:/check alpine test -f /check/.credentials.json
  2. If missing, check host credential source:
    • macOS: security find-generic-password -s "Claude Code-credentials" -w → extract accessToken via jq
    • Linux: read ~/.claude/.credentials.json
  3. If no host credentials found, run claude setup-token interactively on the host (opens browser)
  4. Extract credential and pipe into volume via stdin (never as env var):
    printf '%s' "$cred_json" | docker run --rm -i -v ralph-claude-auth:/dest alpine \
      sh -c 'cat > /dest/.credentials.json && chmod 600 /dest/.credentials.json'
    
  5. Determine the exact .credentials.json format during implementation — it needs to match what claude expects to read natively

First-run auth flow (Cursor):

  1. Check volume for credential file (exact path TBD — discover where cursor-agent login stores creds)
  2. If missing, check host (~/.cursor/ or ~/.config/Cursor/)
  3. If no host credentials, run cursor-agent login interactively on the host (opens browser)
  4. Extract and pipe into volume (same pattern as Claude)

Auth failure recovery:

  • Capture container stdout/stderr
  • On non-zero exit, grep output for auth-related patterns: unauthorized, invalid.*token, invalid.*key, authentication, 401, 403, please log in, etc.
  • If auth failure detected: delete stored credential from volume, re-run auth flow, retry the container
  • If non-auth failure: existing behavior (mark issue status:needs-attention)

3. Host Config Mounts

Mount agent config directories read-only at alternate paths:

  • ~/.claude//home/ralph/.claude-host/:ro
  • ~/.cursor//home/ralph/.cursor-host/:ro

These provide agent rules, settings, and project conventions to the container without conflicting with the writable auth volume.

Entrypoint symlink logic:

  • Symlink files from .<agent>-host/ into .<agent>/ (the volume-backed dir)
  • Skip credential files (.credentials.json, auth tokens, etc.) to avoid overwriting volume-persisted credentials
  • Run this on every container start to keep config fresh

4. Docker Image

Single image with both CLIs installed. Agent selected at runtime via $AGENT env var.

Dockerfile changes:

  • Keep existing base: debian:bookworm-slim + git, curl, jq, ripgrep, fd-find, openssh-client, etc.
  • Keep existing Claude install: nodejs, npm, npm install -g @anthropic-ai/claude-code
  • Add Cursor install: curl https://cursor.com/install -fsSL | bash (verify this works in Docker build context; may need to run before USER ralph if installer requires root)
  • Verify both claude and cursor-agent binaries are on PATH

Image tags unchanged: ralph:uid-<UID> or ralph:custom-<hash>

5. Entrypoint

Receives $AGENT env var (default: claude). Shared logic remains the same (git config, HEAD tracking, push). Agent-specific logic branches for CLI invocation.

Claude flags: --dangerously-skip-permissions --model $MODEL --reasoning-effort high
Cursor flags: --force --trust --sandbox disabled --model $MODEL

Prompt delivery: Both CLIs support -p (print/pipe mode). Use stdin heredoc. If cursor-agent doesn't support stdin in -p mode, fall back to passing prompt as a positional argument (verify during implementation).

The prompt text is identical for both agents.


Implementation Plan

Each step follows this structure:

  1. Implement — Write the code
  2. Test — Write BATS tests
  3. Verify — Run tests, fix failures until all pass
  4. Review — Code review for bugs, edge cases, and conventions
  5. Address feedback — Fix review findings, re-run tests, re-review until clean
  6. Update spec — Mark the step [done] and record any decisions or deviations

Spec maintenance rules

  • Mark each step [done] when complete.
  • Record design decisions that emerged during implementation as notes under the step.
  • Minor deviations (e.g. flag name changes, reordered logic) should be noted and the spec updated to match.
  • Significant design changes (e.g. new subcommands, changed architecture, removed features) require pausing for user review before proceeding.

Step 1: Add --agent flag and agent-specific model defaults [done]

Files:

  • scripts/ralph — Add --agent to arg parser, set per-agent model default

Implement:

  1. Add AGENT=claude to defaults section
  2. Add MODEL_EXPLICIT=0 flag to track whether --model was explicitly set
  3. Add --agent) case to the while arg parser
  4. After parsing, validate agent name (claude or cursor); error exit 2 on invalid
  5. If MODEL_EXPLICIT=0, set MODEL based on agent:
    • claudesonnet
    • cursorsonnet-4 (verify correct name)
  6. Update usage comment header to document --agent

Test:

  • Default agent is claude, default model is sonnet
  • --agent cursor sets model default to sonnet-4
  • --agent cursor --model gpt-5 overrides to gpt-5
  • --agent invalid → exit 2

Verify: Run tests. ralph --help shows new flag.

Review: Backwards compatibility — all existing usage unchanged.

Step 2: Implement Docker volume auth for Claude [done]

Files:

  • scripts/ralph — Replace hardcoded Keychain extraction with volume-based auth

Implement:

  1. Create ensure_auth_volume() function that:
    • Takes agent name as argument
    • Checks if credential exists in the named volume (ralph-<agent>-auth)
    • Returns 0 if exists, 1 if not
  2. Create setup_claude_auth() function that:
    • Checks host for existing credentials:
      • macOS: Keychain Claude Code-credentials → jq extract accessToken
      • Linux: ~/.claude/.credentials.json
    • If no host credentials: run claude setup-token interactively, then re-check
    • Pipe credential into volume via stdin (never env var, never host disk)
  3. Replace the hardcoded Keychain block (lines 140-147) with volume auth check + setup
  4. Remove CLAUDE_CODE_OAUTH_TOKEN from docker run env vars

Test:

  • Volume check correctly detects missing/present credentials
  • macOS Keychain extraction still works
  • Error messages guide user correctly

Verify: Run with Claude, verify no auth env vars in docker run.

Review: No credentials leaked via env vars, logs, or host filesystem.

Notes:

  • check_auth_volume() and pipe_to_auth_volume() are generic helpers taking agent name and credential file as args
  • setup_claude_auth() tries: (1) macOS Keychain, (2) ~/.claude/.credentials.json, (3) interactive claude setup-token
  • ensure_auth() dispatches to the right setup function based on agent; cursor placeholder returns error until Step 3
  • CLAUDE_CODE_OAUTH_TOKEN env var completely removed from docker run — credentials now live only in Docker volumes
  • BATS tests added for: volume check skip, volume check miss + setup, no env var leak, Keychain extraction, filesystem fallback, missing CLI error

Step 3: Implement Docker volume auth for Cursor [done]

Files:

  • scripts/ralph — Add Cursor auth alongside Claude auth

Implement:

  1. Create setup_cursor_auth() function that:
    • Checks host for existing credentials (discover where cursor-agent login stores them)
    • If no host credentials: run cursor-agent login interactively
    • Pipe credential into volume
  2. Create ensure_auth() dispatcher that calls the right setup function based on $AGENT
  3. Determine the exact credential file path and format for Cursor (implementation discovery)

Test:

  • Cursor auth flow prompts correctly when no credentials exist
  • Credential stored in volume and persists across runs

Verify: Run with --agent cursor, verify auth flow works end-to-end.

Review: Same security properties as Claude auth — no env vars, no host disk.

Notes:

  • setup_cursor_auth() checks two host paths: ~/.cursor/auth.json and ~/.config/Cursor/auth.json
  • Falls back to interactive cursor-agent login if no host credentials found
  • Credential file is auth.json (stored in ralph-cursor-auth Docker volume)
  • ensure_auth() already existed from Step 2; cursor case updated from placeholder to real implementation
  • BATS tests added for: volume check skip, volume check miss + setup, volume name verification, alternate config path, missing CLI error

Step 4: Add auth failure detection and re-prompt [done]

Files:

  • scripts/ralph — Add output capture and auth failure detection to container run

Implement:

  1. Capture container stdout/stderr to a variable or temp file
  2. On non-zero exit, check output against auth failure patterns:
    • Case-insensitive grep for: unauthorized, invalid.*(token|key|credential), authentication, 401, 403, please log in, expired
  3. If auth failure detected:
    • Delete credential from volume: docker run --rm -v ralph-<agent>-auth:/auth alpine rm -f /auth/<credential-file>
    • Re-run auth setup flow
    • Retry the container (limit to 1 retry to avoid infinite loops)
  4. If non-auth failure: existing behavior (status:needs-attention)

Test:

  • Auth failure pattern detection works for known error strings
  • Non-auth failures are NOT treated as auth failures
  • Retry limit prevents infinite loops

Verify: Simulate auth failure, verify re-prompt and retry behavior.

Review: Pattern matching is broad enough to catch real auth errors but not false-positive on unrelated errors.

Notes:

  • Added three helper functions: agent_credential_file() (returns credential filename per agent), is_auth_failure() (grep-based pattern matching on captured output), clear_auth_volume() (removes credential file from Docker volume)
  • Container output captured via tee to a temp file while still displaying to stdout
  • pipestatus[1] (zsh) used to get docker exit code from the pipe
  • Auth retry limited to 1 attempt via auth_retried flag
  • On auth failure: clear volume → re-run ensure_auth → retry container
  • On retry failure (auth or not): falls through to existing needs-attention labeling
  • Pattern matches: unauthorized, invalid.*(token|key|credential), authentication failed, 401, 403, please log in, expired.*token, token.*expired
  • Updated existing "needs-attention on container failure" test to handle auth volume check in docker stub
  • BATS tests added for: auth failure retry, non-auth failure passthrough, retry limit, pattern matching, volume clearing (claude and cursor)

Step 5: Update Dockerfile to install both CLIs [done]

Files:

  • docker/ralph/Dockerfile — Add Cursor agent CLI installation

Implement:

  1. Keep existing Claude install: npm install -g @anthropic-ai/claude-code
  2. Add Cursor install: curl https://cursor.com/install -fsSL | bash
    • Verify this works in Docker build context
    • May need to run before USER ralph if installer requires root
    • Verify cursor-agent binary is on PATH after install
  3. Verify both claude and cursor-agent are functional in the built image

Test:

  • docker run <image> which claude → found
  • docker run <image> which cursor-agent → found
  • docker run <image> claude --version → works
  • docker run <image> cursor-agent --version → works

Verify: Build image, run verification commands.

Review: Image size acceptable, both CLIs work, no conflicts.

Notes:

  • Cursor installer runs as root (before USER ralph) since it installs to system paths
  • command -v cursor-agent verification ensures the build fails fast if the installer doesn't place the binary on PATH
  • Both CLI installs are separate RUN layers for better Docker cache behavior

Step 6: Update entrypoint for agent-specific invocation and config symlinks [done]

Files:

  • docker/ralph/entrypoint.sh — Add agent branching, config symlink logic

Implement:

  1. Add config symlink step at the top of entrypoint:
    • For the active agent, symlink files from .<agent>-host/ into .<agent>/
    • Skip credential files (.credentials.json, auth tokens) to preserve volume-stored credentials
    • Handle case where host mount doesn't exist (no-op)
  2. Build prompt text into a variable (extract from current heredoc)
  3. Branch on ${AGENT:-claude}:
    • claude: pipe prompt to claude -p --dangerously-skip-permissions --model "$MODEL" --reasoning-effort high
    • cursor: pipe prompt to cursor-agent -p --force --trust --sandbox disabled --model "$MODEL"
  4. If cursor-agent doesn't support stdin in -p mode, fall back to positional arg
  5. Keep shared logic unchanged: git config, HEAD tracking, push

Test:

  • Config files are correctly symlinked from host mount
  • Credential files are NOT overwritten by symlinks
  • Claude invocation matches current behavior
  • Cursor invocation uses correct flags

Verify: Run container with each agent, verify config symlinks and CLI invocation.

Review: Symlink logic is safe (no overwrite of credentials), prompt identical for both.

Notes:

  • symlink_agent_config() function iterates files in .<agent>-host/, skips credential files per agent, and creates symlinks in .<agent>/
  • Skip patterns: .credentials.json for claude, auth.json for cursor
  • Existing non-symlink files in the agent dir are preserved (protects volume-persisted data)
  • Existing symlinks are updated via ln -sf to keep config fresh on each run
  • $AGENT defaults to claude when unset (${AGENT:-claude})
  • Prompt text extracted into $PROMPT_TEXT variable, piped to whichever agent CLI is selected
  • Claude flags: -p --dangerously-skip-permissions --model $MODEL --reasoning-effort high
  • Cursor flags: -p --force --trust --sandbox disabled --model $MODEL
  • Unknown agent values cause exit 1 (defensive — should never happen since ralph validates)
  • BATS tests cover: both agents invoked with correct flags, config symlinks, credential file preservation, missing host dir, symlink updates, default agent

Step 7: Update Docker run in ralph for volume and config mounts [done]

Files:

  • scripts/ralph — Update docker run in process_issue for new mount scheme

Implement:

  1. Add auth volume mount: -v ralph-<agent>-auth:/home/ralph/.<agent>/
  2. Add host config mounts (read-only):
    • -v "$HOME/.claude:/home/ralph/.claude-host:ro" (if dir exists)
    • -v "$HOME/.cursor:/home/ralph/.cursor-host:ro" (if dir exists)
  3. Add -e "AGENT=$AGENT" to docker run
  4. Remove -e "CLAUDE_CODE_OAUTH_TOKEN=$OAUTH_TOKEN" (no longer needed)
  5. Keep existing mounts: worktree, git dir, gitconfig, ssh, spec file

Test:

  • Default --agent claude mounts correct volume and config
  • --agent cursor mounts correct volume and config
  • No auth env vars in docker run command
  • Existing mounts (worktree, git, ssh) unchanged

Verify: Run with each agent, inspect docker run args.

Review: No hardcoded Claude references remain. Volume names are agent-specific.

Notes:

  • run_container() updated with three new mount/env additions: auth volume, host config mounts, and AGENT env var
  • Auth volume: -v ralph-<agent>-auth:/home/ralph/.<agent>/ — uses agent name for both volume name and mount path
  • Host config mounts: conditionally added only when ~/.claude or ~/.cursor directories exist on host, mounted read-only at .<agent>-host
  • config_mounts local array built dynamically based on directory existence checks
  • -e AGENT=$AGENT passes the selected agent to the entrypoint for CLI branching
  • No CLAUDE_CODE_OAUTH_TOKEN or any auth env vars in docker run — confirmed removed
  • Existing mounts (worktree, git dir, gitconfig, ssh, spec file) unchanged
  • BATS tests added for: auth volume mount (claude + cursor), AGENT env var (claude + cursor), host config read-only mount, skip missing config dirs, no auth env vars, existing mounts preserved

Step 8: Update CLAUDE.md documentation [done]

Files:

  • CLAUDE.md — Update ralph section with new CLI options

Implement:

  1. Add --agent <name> to the Options table (default: claude, also: cursor)
  2. Update architecture notes about Docker volume auth
  3. Add examples showing cursor usage

Test:

  • Documentation accurately reflects implementation

Verify: Read and verify accuracy.

Step 9: Run all checks [done]

Implement:

  1. Run shellcheck on scripts/ralph and docker/ralph/entrypoint.sh
  2. Run zsh -n scripts/ralph
  3. Run bash -n docker/ralph/entrypoint.sh
  4. Run BATS tests if any exist for ralph
  5. Fix any failures

Verify: All checks pass clean.

Notes:

  • docker/ralph/entrypoint.sh: shellcheck clean, bash syntax check clean, all 26 BATS tests pass
  • scripts/ralph: Fixed three legitimate shellcheck findings (SC2053: unquoted == RHS on lines 585/626, SC2295: unquoted expansion in ${..} on line 346). Remaining warnings are zsh idioms that shellcheck doesn't understand ($match[], $pipestatus, local at top level, quoted regex in =~)
  • zsh -n and test_ralph.bats could not run in this container (no zsh installed) — these tests require the host environment

Step 10: Create commit

Implement:

  1. Stage all changes and create a commit: "Add multi-agent support to ralph (Claude + Cursor)"

Verify: git log -1 shows the commit.


Conventions

  • Language: zsh for scripts/ralph, bash for docker/ralph/entrypoint.sh
  • Tests: BATS with temp directories for isolation
  • Error messages: Prefix with ralph:
  • Exit codes: 0=success, 1=runtime error, 2=usage error

Metadata

Metadata

Assignees

No one assigned

    Labels

    specRalph spec for automated execution

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions