Skip to content

Ralph Sandbox Migration #27

@rjernst

Description

@rjernst

branch: ralph-sandbox-migration
depends: [24, 25, 26]

Spec: Ralph Sandbox Migration

Source issues: #24 (proxy), #25 (token management), #26 (sandbox integration)

Overview

Wire together the credential injection proxy (#24), token management (#25), and Docker sandbox (#26) into the ralph orchestration script, replacing the current docker run approach. This spec handles the end-to-end flow: ralph starts the proxy, creates/reuses a sandbox, runs Claude via docker sandbox exec, and cleans up.

Also adds a pre-run smoke test (ralph selftest) that validates the entire pipeline without running a real spec, and an integration test that exercises the full flow against a mock project.

Architecture

ralph --issue 42
  │
  ├─ 1. check-token (exits if expired)
  │
  ├─ 2. Start proxy container
  │     get-token | docker run -i --rm agent-loop-proxy
  │
  ├─ 3. Ensure worktree (existing logic)
  │
  ├─ 4. Ensure sandbox (create or reuse)
  │     docker sandbox create -t agent-loop-sandbox-claude:<hash>
  │     Apply network policy
  │
  ├─ 5. Pre-flight validation
  │     Proxy health, sandbox responsive, network policy
  │
  ├─ 6. Run iteration(s)
  │     docker sandbox exec -e phantom -e BASE_URL claude -p ...
  │     Check for commits, update issue, loop
  │
  ├─ 7. Push (if --push, from host)
  │
  └─ 8. Cleanup
        Stop proxy container
        Sandbox persists (cleaned on issue done or prune)

1. Proxy Lifecycle

Ralph manages the proxy container as a subprocess/background container:

Start (once per ralph invocation):

token = get_token(agent)  # from Keychain
proxy_proc = start_proxy_container(agent, port, token)
# token piped via stdin, container runs in background

Stop (on ralph exit):

stop_proxy_container(agent)
# docker stop + docker rm (or --rm handles it)

Proxy container name: agent-loop-proxy-<agent> (e.g., agent-loop-proxy-claude)

Port allocation: use a fixed port per agent (e.g., claude=18080). If port is in use, ralph detects and reuses the existing proxy (check health endpoint first).

2. Replace Docker Class

Replace the existing Docker class with a new Sandbox class (from #26). Key method changes:

Old (Docker class) New (Sandbox class)
build_image(tag) ensure_image(agent)
compute_tag(packages) compute_tag(agent) — hash-based
ensure_image(packages) ensure_image(agent) — with rebuild detection
run_container(...) run_iteration(sandbox_name, worktree, prompt, model)

The run_iteration method uses docker sandbox exec instead of docker run.

3. Replace Auth Flow

Old New
get_auth_token() reads Keychain "Claude Code-credentials" check_token(agent) validates, get_token(agent) extracts from "claude-token"
CLAUDE_CODE_OAUTH_TOKEN=<real> passed to container CLAUDE_CODE_OAUTH_TOKEN=phantom + ANTHROPIC_BASE_URL=proxy
Token visible inside container Token never enters sandbox

4. Remove Old Docker Infrastructure

After migration:

  • Remove docker/ralph/Dockerfile (replaced by docker/agent-loop/claude/Dockerfile)
  • Remove docker/ralph/entrypoint.sh (no entrypoint needed — sandbox uses sleep infinity)
  • Remove Docker class from scripts/ralph
  • Remove --packages flag (packages now in Dockerfile, rebuild via --rebuild)
  • Update CLAUDE.md references

5. Selftest Command

ralph selftest [--agent <name>]:

  1. Check token: check-token --agent <agent>
  2. Start proxy: pipe token, verify health endpoint
  3. Build image: ensure_image(agent)
  4. Create test sandbox: docker sandbox create --name agent-loop-selftest-<agent> ...
  5. Apply network policy
  6. Verify proxy reachable from sandbox: docker sandbox exec ... curl proxy health
  7. Verify Claude auth works: docker sandbox exec ... claude -p "say ok" (through proxy)
  8. Verify network isolation: docker sandbox exec ... curl google.com returns blocked
  9. Cleanup: remove test sandbox, stop proxy
  10. Report: all checks passed / which failed with diagnostics

6. Integration Test

tests/test_ralph_integration.py — a pytest test that runs the selftest flow programmatically:

  • Skipped by default (requires Docker Desktop running + valid token)
  • Enabled with pytest -m integration or env var RALPH_INTEGRATION_TESTS=1
  • Tests the real Docker sandbox + proxy + auth flow
  • Validates: image build, sandbox create, proxy connectivity, claude execution, network isolation, cleanup
  • Timeout: 120 seconds per test

Implementation Plan

Step 1: Implement proxy lifecycle in ralph [DONE]

Files:

  • scripts/ralph — add start_proxy(), stop_proxy(), ensure_proxy()

Implement:

  1. start_proxy(agent, port): read token from Keychain, pipe to docker run -i --rm -d --name agent-loop-proxy-<agent> -p <port>:18080 agent-loop-proxy:v1
  2. stop_proxy(agent): docker stop agent-loop-proxy-<agent>
  3. ensure_proxy(agent, port): check if proxy running (health endpoint), start if not, reuse if healthy
  4. Register atexit handler to stop proxy on ralph exit

Test:

  • tests/test_ralph.py:
    • start_proxy constructs correct docker run command (mocked)
    • ensure_proxy reuses healthy proxy (mock health check)
    • ensure_proxy starts new when none running
    • stop_proxy calls docker stop

Verify: pytest tests/test_ralph.py -v

Review: Cleanup on crash/signal, port conflicts, container naming.

Address feedback: Fix, re-test.

Step 2: Replace Docker class with Sandbox class [DONE]

Files:

  • scripts/ralph — replace Docker class, update process_issue() and poll_loop()

Implement:

  1. Replace Docker class with Sandbox class (from Ralph Docker Sandbox Integration #26)
  2. Update process_issue(): use ensure_sandbox() + docker sandbox exec instead of docker.run_container()
  3. Update poll_loop(): same changes
  4. Update main(): instantiate Sandbox instead of Docker, start/stop proxy
  5. Remove --packages flag
  6. Add --rebuild flag (forces image rebuild)
  7. Add --agent flag (default: claude)

Test:

  • tests/test_ralph.py:
    • process_issue uses sandbox exec (mock subprocess calls)
    • --rebuild triggers image rebuild
    • --agent codex uses correct namespacing throughout
    • Existing pure-function tests still pass

Verify: pytest tests/test_ralph.py -v

Review: All Docker class references removed, no regressions in existing behavior.

Address feedback: Fix, re-test.

Step 3: Replace auth flow [DONE]

Files:

  • scripts/ralph — update auth handling in main() and process_issue()

Implement:

  1. Replace get_auth_token() with check_token(agent) + get_token(agent) (from Ralph Token Management #25)
  2. Proxy receives real token via stdin
  3. Sandbox exec uses CLAUDE_CODE_OAUTH_TOKEN=phantom + ANTHROPIC_BASE_URL
  4. Remove old get_auth_token() function
  5. Remove CLAUDE_CODE_OAUTH_TOKEN from container env (replaced by phantom)

Test:

  • tests/test_ralph.py:
    • process_issue passes phantom token to sandbox exec
    • ANTHROPIC_BASE_URL set to proxy address
    • Old get_auth_token no longer called

Verify: pytest tests/test_ralph.py -v

Review: No real token in sandbox env, proxy address correct.

Address feedback: Fix, re-test.

Step 4: Remove old Docker infrastructure [DONE]

Files:

  • Delete docker/ralph/Dockerfile
  • Delete docker/ralph/entrypoint.sh
  • scripts/ralph — remove Docker class (if not already removed in step 2)
  • Update CLAUDE.md — update architecture docs, directory structure, command reference

Implement:

  1. Delete old files
  2. Update CLAUDE.md: directory structure, ralph command reference, Docker section
  3. Update tests: remove test_ralph_entrypoint.bats (no entrypoint to test)

Test:

  • Old entrypoint tests removed
  • All remaining tests pass

Verify: pytest tests/ -v && bats tests/test_ralph.bats

Review: No dangling references to old Docker infrastructure.

Address feedback: Fix, re-test.

Step 5: Implement selftest command [DONE]

Files:

  • scripts/ralph — add selftest(agent) function and selftest subcommand

Implement:

  1. Implement selftest(agent) as described in section 5
  2. Add selftest subcommand routing in main()
  3. Each check prints pass/fail with diagnostics
  4. Exit 0 if all pass, 1 if any fail

Test:

  • tests/test_ralph.py:
    • selftest calls each check in order (mocked)
    • Reports correct pass/fail for each check
    • Cleans up test sandbox even on failure

Verify: pytest tests/test_ralph.py -v

Review: Cleanup is robust (try/finally), error messages are actionable.

Address feedback: Fix, re-test.

Step 6: Add integration test [DONE]

Files:

  • tests/test_ralph_integration.py — end-to-end test

Implement:

  1. pytest marker: @pytest.mark.integration
  2. Skip unless RALPH_INTEGRATION_TESTS=1 env var set
  3. Test: build image, start proxy, create sandbox, run claude through proxy, verify network isolation, cleanup
  4. Timeout: 120 seconds
  5. Cleanup in fixture finalizer (always runs)

Test:

  • Run with RALPH_INTEGRATION_TESTS=1 pytest tests/test_ralph_integration.py -v

Verify: Integration test passes in local environment with Docker Desktop running.

Review: Test isolation, cleanup reliability, timeout handling.

Address feedback: Fix, re-test.

Step 7: Run all checks

Implement:

  1. Run pytest tests/ -v (excluding integration unless opted in)
  2. Run bats tests/test_ralph.bats (if still relevant)
  3. Run shellcheck and zsh -n on shell scripts
  4. Verify Docker builds succeed

Verify: All checks pass clean.

Step 8: Create commit

Implement:

  1. Stage all changes and create a commit summarizing the sandbox migration.

Verify: git log -1 shows the commit.


Conventions

  • Language: Python 3 (stdlib only)
  • Tests: pytest with unittest.mock; integration tests with @pytest.mark.integration
  • Error messages: Prefix with ralph:
  • Exit codes: 0=success, 1=runtime error, 2=usage error

Metadata

Metadata

Assignees

No one assigned

    Labels

    specRalph spec for automated executionstatus:doneCompleted

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions