-
Notifications
You must be signed in to change notification settings - Fork 4
Description
branch: ralph-docker-sandbox
depends: [24, 25]
Spec: Ralph Docker Sandbox Integration
Source issues: #24 (credential injection proxy), #25 (token management)
Overview
Replace ralph's plain docker run execution with Docker sandbox for microVM-level isolation. This includes: a custom Dockerfile for the sandbox template (build tools + Claude Code), sandbox lifecycle management (create, reuse, cleanup), network policy enforcement (deny-by-default), and image rebuild detection.
The sandbox ensures Claude Code runs in a hardened environment: no SSH keys, no git push credentials, network restricted to the Anthropic API (via the credential injection proxy) only. Ralph handles all git operations on the host.
Designed with multi-agent namespacing: sandbox names, Dockerfiles, and network policies are parameterized by agent type. Default agent is claude.
Architecture
┌─ ralph (host) ──────────────────────────────────────────┐
│ │
│ 1. Ensure image: docker build (layer-cached) │
│ docker/agent-loop/claude/Dockerfile │
│ FROM docker/sandbox-templates:claude-code │
│ + build-essential, jq, ripgrep, fd-find, openssh │
│ │
│ 2. Ensure sandbox: docker sandbox create │
│ --name agent-loop-claude-<branch> │
│ -t agent-loop-sandbox-claude:v<hash> │
│ claude /path/to/worktree │
│ │
│ 3. Apply network policy: │
│ --policy deny │
│ --allow-host localhost (for proxy) │
│ --allow-host api.anthropic.com │
│ --allow-host statsig.anthropic.com │
│ --allow-host sentry.io │
│ │
│ 4. Run claude via exec: │
│ docker sandbox exec │
│ -e CLAUDE_CODE_OAUTH_TOKEN=phantom │
│ -e ANTHROPIC_BASE_URL=http://host.docker.internal:<port> │
│ agent-loop-claude-<branch> │
│ claude -p --dangerously-skip-permissions ... │
│ │
│ 5. Cleanup: docker sandbox rm agent-loop-claude-<branch> │
└──────────────────────────────────────────────────────────┘
1. Sandbox Dockerfile
File: docker/agent-loop/claude/Dockerfile
FROM docker/sandbox-templates:claude-code
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential jq openssh-client fd-find \
&& rm -rf /var/lib/apt/lists/*
USER agentThe docker/agent-loop/ directory uses <agent>/Dockerfile layout for future multi-agent support.
2. Image Build and Rebuild Detection
Image tag: agent-loop-sandbox-<agent>:<content-hash>
The content hash is derived from:
- SHA256 of the Dockerfile contents
- SHA256 of the base image digest (
docker/sandbox-templates:claude-code)
If either changes, the image tag changes, triggering a rebuild.
Rebuild flow:
- Compute content hash from Dockerfile + base image digest
- Check if
agent-loop-sandbox-<agent>:<hash>exists locally - If missing:
docker build -t agent-loop-sandbox-<agent>:<hash> docker/agent-loop/<agent>/ - Docker layer caching makes unchanged rebuilds near-instant
Base image update detection:
docker pull docker/sandbox-templates:claude-code(check for new digest)- Only pull if
--rebuildflag or if base image is older than 7 days (check viadocker inspect --format '{{.Created}}') - A new base digest changes the content hash, triggering rebuild
Manual rebuild: ralph --rebuild forces a fresh docker pull + docker build.
3. Sandbox Lifecycle
Naming convention: agent-loop-<agent>-<sanitized-branch> (e.g., agent-loop-claude-fix-auth)
Create
def ensure_sandbox(agent, branch, worktree_path):
name = sandbox_name(agent, branch)
# Check if sandbox exists
existing = docker_sandbox_ls_json()
for vm in existing["vms"]:
if vm["name"] == name:
return name # reuse
# Create with custom template
tag = ensure_image(agent)
docker_sandbox_create(name, tag, worktree_path)
apply_network_policy(name)
return nameCleanup (on issue completion)
When ralph marks an issue status:done:
docker_sandbox_rm(sandbox_name(agent, branch))Orphan pruning
ralph prune-sandboxes:
- List all sandboxes matching
agent-loop-<agent>-* - For each, check if the workspace path still exists on disk
- If not,
docker sandbox rmit - Report what was pruned
4. Network Policy
Applied immediately after sandbox creation:
docker sandbox network proxy <name> \
--policy deny \
--allow-host localhost \
--allow-host api.anthropic.com \
--allow-host statsig.anthropic.com \
--allow-host sentry.ioNote: localhost must be allowed for the proxy connection via host.docker.internal.
5. Pre-flight Validation
Before starting a real iteration, ralph validates:
- Token valid:
ralph check-token --agent <agent>exits 0 - Proxy running:
curl -s http://localhost:<port>/healthreturns 200 - Sandbox responsive:
docker sandbox exec <name> echo oksucceeds - Network policy applied:
docker sandbox exec <name> curl -s --max-time 3 https://google.comreturns blocked message
If any check fails, ralph prints a diagnostic message and exits with an actionable error.
6. Execution via Sandbox
Replace docker.run_container() with:
docker sandbox exec \
-e "CLAUDE_CODE_OAUTH_TOKEN=phantom" \
-e "ANTHROPIC_BASE_URL=http://host.docker.internal:<port>" \
-w <worktree_path> \
<sandbox_name> \
claude -p --dangerously-skip-permissions --model <model> <prompt>Key differences from current docker run:
- No
--rm(sandbox persists) - No
-vmounts (workspace synced automatically) - No SSH key mounting (sandbox has no push credentials)
- No
CLAUDE_CODE_OAUTH_TOKEN=<real>(phantom token only) - No
GIT_USER/GIT_EMAIL(configured once inside sandbox)
Implementation Plan
Step 1: Create sandbox Dockerfile [DONE]
Files:
docker/agent-loop/claude/Dockerfile
Implement:
- Create Dockerfile based on
docker/sandbox-templates:claude-code - Install build tools as root, switch back to agent user
Test:
docker build -t agent-loop-sandbox-claude:test docker/agent-loop/claude/succeedsdocker sandbox create --name test-sb -t agent-loop-sandbox-claude:test claude /tmp/test-dirsucceedsdocker sandbox exec test-sb bash -c 'which gcc && which rg && claude --version'returns valid paths- Clean up:
docker sandbox rm test-sb
Verify: Build and sandbox creation succeed.
Review: Minimal packages, correct USER directives.
Address feedback: Fix, re-test.
Step 2: Implement image build and rebuild detection [DONE]
Files:
scripts/ralph— addSandboxclass withensure_image(),compute_tag(),needs_rebuild()
Implement:
compute_tag(agent): hash Dockerfile contents + base image digest →agent-loop-sandbox-<agent>:<hash>ensure_image(agent): check if tag exists, build if notneeds_rebuild(agent): check base image age, pull if stale--rebuildflag: force pull + build
Test:
tests/test_ralph.py(extend):- Content hash changes when Dockerfile changes
- Content hash changes when base digest changes
ensure_imageskips build when tag exists (mockdocker image inspect)--rebuildforces pull + build
Verify: pytest tests/test_ralph.py -v
Review: Hash computation is deterministic, caching logic correct.
Address feedback: Fix, re-test.
Step 3: Implement sandbox lifecycle management [DONE]
Files:
scripts/ralph— addensure_sandbox(),cleanup_sandbox(),prune_sandboxes()toSandboxclass
Implement:
sandbox_name(agent, branch):agent-loop-<agent>-<sanitized_branch>ensure_sandbox(agent, branch, worktree_path): check existing viadocker sandbox ls --json, create if missingapply_network_policy(name): deny-by-default + allowed hostscleanup_sandbox(agent, branch):docker sandbox rmprune_sandboxes(agent): list ralph sandboxes, remove those with non-existent workspace paths- Add
prune-sandboxessubcommand
Test:
tests/test_ralph.py:- Sandbox name generation with special characters in branch
ensure_sandboxreuses existing (mockdocker sandbox ls)ensure_sandboxcreates new when missingprune_sandboxesremoves orphans, keeps active- Network policy command is correctly constructed
Verify: pytest tests/test_ralph.py -v
Review: Naming collisions, cleanup safety, policy correctness.
Address feedback: Fix, re-test.
Step 4: Implement pre-flight validation [DONE]
Files:
scripts/ralph— addpreflight_check()toSandboxclass
Implement:
- Check token validity (call
check_token) - Check proxy health (HTTP GET to
/health) - Check sandbox responsive (
docker sandbox exec ... echo ok) - Check network policy (attempt blocked request)
- Return list of failures with actionable messages
Test:
tests/test_ralph.py:- All checks pass → returns empty list
- Token expired → returns token error with instructions
- Proxy down → returns proxy error
- Sandbox unresponsive → returns sandbox error
Verify: pytest tests/test_ralph.py -v
Review: Error messages are actionable, checks don't have side effects.
Address feedback: Fix, re-test.
Step 5: Run all checks
Implement:
- Run
pytest tests/ -v - Run
shellcheckandzsh -non shell scripts - Verify Docker build succeeds
Verify: All checks pass clean.
Step 6: Create commit
Implement:
- Stage all changes and create a commit summarizing Docker sandbox integration.
Verify: git log -1 shows the commit.
Conventions
- Language: Python 3 (stdlib only)
- Tests: pytest with unittest.mock
- Error messages: Prefix with
ralph: - Exit codes: 0=success, 1=runtime error, 2=usage error