-
Notifications
You must be signed in to change notification settings - Fork 4
Description
branch: ralph-sandbox-migration
depends: [24, 25, 26]
Spec: Ralph Sandbox Migration
Source issues: #24 (proxy), #25 (token management), #26 (sandbox integration)
Overview
Wire together the credential injection proxy (#24), token management (#25), and Docker sandbox (#26) into the ralph orchestration script, replacing the current docker run approach. This spec handles the end-to-end flow: ralph starts the proxy, creates/reuses a sandbox, runs Claude via docker sandbox exec, and cleans up.
Also adds a pre-run smoke test (ralph selftest) that validates the entire pipeline without running a real spec, and an integration test that exercises the full flow against a mock project.
Architecture
ralph --issue 42
│
├─ 1. check-token (exits if expired)
│
├─ 2. Start proxy container
│ get-token | docker run -i --rm agent-loop-proxy
│
├─ 3. Ensure worktree (existing logic)
│
├─ 4. Ensure sandbox (create or reuse)
│ docker sandbox create -t agent-loop-sandbox-claude:<hash>
│ Apply network policy
│
├─ 5. Pre-flight validation
│ Proxy health, sandbox responsive, network policy
│
├─ 6. Run iteration(s)
│ docker sandbox exec -e phantom -e BASE_URL claude -p ...
│ Check for commits, update issue, loop
│
├─ 7. Push (if --push, from host)
│
└─ 8. Cleanup
Stop proxy container
Sandbox persists (cleaned on issue done or prune)
1. Proxy Lifecycle
Ralph manages the proxy container as a subprocess/background container:
Start (once per ralph invocation):
token = get_token(agent) # from Keychain
proxy_proc = start_proxy_container(agent, port, token)
# token piped via stdin, container runs in backgroundStop (on ralph exit):
stop_proxy_container(agent)
# docker stop + docker rm (or --rm handles it)Proxy container name: agent-loop-proxy-<agent> (e.g., agent-loop-proxy-claude)
Port allocation: use a fixed port per agent (e.g., claude=18080). If port is in use, ralph detects and reuses the existing proxy (check health endpoint first).
2. Replace Docker Class
Replace the existing Docker class with a new Sandbox class (from #26). Key method changes:
| Old (Docker class) | New (Sandbox class) |
|---|---|
build_image(tag) |
ensure_image(agent) |
compute_tag(packages) |
compute_tag(agent) — hash-based |
ensure_image(packages) |
ensure_image(agent) — with rebuild detection |
run_container(...) |
run_iteration(sandbox_name, worktree, prompt, model) |
The run_iteration method uses docker sandbox exec instead of docker run.
3. Replace Auth Flow
| Old | New |
|---|---|
get_auth_token() reads Keychain "Claude Code-credentials" |
check_token(agent) validates, get_token(agent) extracts from "claude-token" |
CLAUDE_CODE_OAUTH_TOKEN=<real> passed to container |
CLAUDE_CODE_OAUTH_TOKEN=phantom + ANTHROPIC_BASE_URL=proxy |
| Token visible inside container | Token never enters sandbox |
4. Remove Old Docker Infrastructure
After migration:
- Remove
docker/ralph/Dockerfile(replaced bydocker/agent-loop/claude/Dockerfile) - Remove
docker/ralph/entrypoint.sh(no entrypoint needed — sandbox usessleep infinity) - Remove
Dockerclass fromscripts/ralph - Remove
--packagesflag (packages now in Dockerfile, rebuild via--rebuild) - Update CLAUDE.md references
5. Selftest Command
ralph selftest [--agent <name>]:
- Check token:
check-token --agent <agent> - Start proxy: pipe token, verify health endpoint
- Build image:
ensure_image(agent) - Create test sandbox:
docker sandbox create --name agent-loop-selftest-<agent> ... - Apply network policy
- Verify proxy reachable from sandbox:
docker sandbox exec ... curl proxy health - Verify Claude auth works:
docker sandbox exec ... claude -p "say ok"(through proxy) - Verify network isolation:
docker sandbox exec ... curl google.comreturns blocked - Cleanup: remove test sandbox, stop proxy
- Report: all checks passed / which failed with diagnostics
6. Integration Test
tests/test_ralph_integration.py — a pytest test that runs the selftest flow programmatically:
- Skipped by default (requires Docker Desktop running + valid token)
- Enabled with
pytest -m integrationor env varRALPH_INTEGRATION_TESTS=1 - Tests the real Docker sandbox + proxy + auth flow
- Validates: image build, sandbox create, proxy connectivity, claude execution, network isolation, cleanup
- Timeout: 120 seconds per test
Implementation Plan
Step 1: Implement proxy lifecycle in ralph [DONE]
Files:
scripts/ralph— addstart_proxy(),stop_proxy(),ensure_proxy()
Implement:
start_proxy(agent, port): read token from Keychain, pipe todocker run -i --rm -d --name agent-loop-proxy-<agent> -p <port>:18080 agent-loop-proxy:v1stop_proxy(agent):docker stop agent-loop-proxy-<agent>ensure_proxy(agent, port): check if proxy running (health endpoint), start if not, reuse if healthy- Register
atexithandler to stop proxy on ralph exit
Test:
tests/test_ralph.py:start_proxyconstructs correct docker run command (mocked)ensure_proxyreuses healthy proxy (mock health check)ensure_proxystarts new when none runningstop_proxycalls docker stop
Verify: pytest tests/test_ralph.py -v
Review: Cleanup on crash/signal, port conflicts, container naming.
Address feedback: Fix, re-test.
Step 2: Replace Docker class with Sandbox class [DONE]
Files:
scripts/ralph— replaceDockerclass, updateprocess_issue()andpoll_loop()
Implement:
- Replace
Dockerclass withSandboxclass (from Ralph Docker Sandbox Integration #26) - Update
process_issue(): useensure_sandbox()+docker sandbox execinstead ofdocker.run_container() - Update
poll_loop(): same changes - Update
main(): instantiateSandboxinstead ofDocker, start/stop proxy - Remove
--packagesflag - Add
--rebuildflag (forces image rebuild) - Add
--agentflag (default:claude)
Test:
tests/test_ralph.py:process_issueuses sandbox exec (mock subprocess calls)--rebuildtriggers image rebuild--agent codexuses correct namespacing throughout- Existing pure-function tests still pass
Verify: pytest tests/test_ralph.py -v
Review: All Docker class references removed, no regressions in existing behavior.
Address feedback: Fix, re-test.
Step 3: Replace auth flow [DONE]
Files:
scripts/ralph— update auth handling inmain()andprocess_issue()
Implement:
- Replace
get_auth_token()withcheck_token(agent)+get_token(agent)(from Ralph Token Management #25) - Proxy receives real token via stdin
- Sandbox exec uses
CLAUDE_CODE_OAUTH_TOKEN=phantom+ANTHROPIC_BASE_URL - Remove old
get_auth_token()function - Remove
CLAUDE_CODE_OAUTH_TOKENfrom container env (replaced by phantom)
Test:
tests/test_ralph.py:process_issuepasses phantom token to sandbox execANTHROPIC_BASE_URLset to proxy address- Old
get_auth_tokenno longer called
Verify: pytest tests/test_ralph.py -v
Review: No real token in sandbox env, proxy address correct.
Address feedback: Fix, re-test.
Step 4: Remove old Docker infrastructure [DONE]
Files:
- Delete
docker/ralph/Dockerfile - Delete
docker/ralph/entrypoint.sh scripts/ralph— removeDockerclass (if not already removed in step 2)- Update
CLAUDE.md— update architecture docs, directory structure, command reference
Implement:
- Delete old files
- Update CLAUDE.md: directory structure, ralph command reference, Docker section
- Update tests: remove
test_ralph_entrypoint.bats(no entrypoint to test)
Test:
- Old entrypoint tests removed
- All remaining tests pass
Verify: pytest tests/ -v && bats tests/test_ralph.bats
Review: No dangling references to old Docker infrastructure.
Address feedback: Fix, re-test.
Step 5: Implement selftest command [DONE]
Files:
scripts/ralph— addselftest(agent)function andselftestsubcommand
Implement:
- Implement
selftest(agent)as described in section 5 - Add
selftestsubcommand routing inmain() - Each check prints pass/fail with diagnostics
- Exit 0 if all pass, 1 if any fail
Test:
tests/test_ralph.py:selftestcalls each check in order (mocked)- Reports correct pass/fail for each check
- Cleans up test sandbox even on failure
Verify: pytest tests/test_ralph.py -v
Review: Cleanup is robust (try/finally), error messages are actionable.
Address feedback: Fix, re-test.
Step 6: Add integration test [DONE]
Files:
tests/test_ralph_integration.py— end-to-end test
Implement:
- pytest marker:
@pytest.mark.integration - Skip unless
RALPH_INTEGRATION_TESTS=1env var set - Test: build image, start proxy, create sandbox, run claude through proxy, verify network isolation, cleanup
- Timeout: 120 seconds
- Cleanup in fixture finalizer (always runs)
Test:
- Run with
RALPH_INTEGRATION_TESTS=1 pytest tests/test_ralph_integration.py -v
Verify: Integration test passes in local environment with Docker Desktop running.
Review: Test isolation, cleanup reliability, timeout handling.
Address feedback: Fix, re-test.
Step 7: Run all checks
Implement:
- Run
pytest tests/ -v(excluding integration unless opted in) - Run
bats tests/test_ralph.bats(if still relevant) - Run
shellcheckandzsh -non shell scripts - Verify Docker builds succeed
Verify: All checks pass clean.
Step 8: Create commit
Implement:
- Stage all changes and create a commit summarizing the sandbox migration.
Verify: git log -1 shows the commit.
Conventions
- Language: Python 3 (stdlib only)
- Tests: pytest with unittest.mock; integration tests with
@pytest.mark.integration - Error messages: Prefix with
ralph: - Exit codes: 0=success, 1=runtime error, 2=usage error