Skip to content

Ralph Docker Sandbox Integration #26

@rjernst

Description

@rjernst

branch: ralph-docker-sandbox
depends: [24, 25]

Spec: Ralph Docker Sandbox Integration

Source issues: #24 (credential injection proxy), #25 (token management)

Overview

Replace ralph's plain docker run execution with Docker sandbox for microVM-level isolation. This includes: a custom Dockerfile for the sandbox template (build tools + Claude Code), sandbox lifecycle management (create, reuse, cleanup), network policy enforcement (deny-by-default), and image rebuild detection.

The sandbox ensures Claude Code runs in a hardened environment: no SSH keys, no git push credentials, network restricted to the Anthropic API (via the credential injection proxy) only. Ralph handles all git operations on the host.

Designed with multi-agent namespacing: sandbox names, Dockerfiles, and network policies are parameterized by agent type. Default agent is claude.

Architecture

┌─ ralph (host) ──────────────────────────────────────────┐
│                                                          │
│  1. Ensure image: docker build (layer-cached)            │
│     docker/agent-loop/claude/Dockerfile               │
│     FROM docker/sandbox-templates:claude-code             │
│     + build-essential, jq, ripgrep, fd-find, openssh     │
│                                                          │
│  2. Ensure sandbox: docker sandbox create                │
│     --name agent-loop-claude-<branch>                         │
│     -t agent-loop-sandbox-claude:v<hash>                      │
│     claude /path/to/worktree                             │
│                                                          │
│  3. Apply network policy:                                │
│     --policy deny                                        │
│     --allow-host localhost (for proxy)                   │
│     --allow-host api.anthropic.com                       │
│     --allow-host statsig.anthropic.com                   │
│     --allow-host sentry.io                               │
│                                                          │
│  4. Run claude via exec:                                 │
│     docker sandbox exec                                  │
│       -e CLAUDE_CODE_OAUTH_TOKEN=phantom                 │
│       -e ANTHROPIC_BASE_URL=http://host.docker.internal:<port> │
│       agent-loop-claude-<branch>                              │
│       claude -p --dangerously-skip-permissions ...        │
│                                                          │
│  5. Cleanup: docker sandbox rm agent-loop-claude-<branch>     │
└──────────────────────────────────────────────────────────┘

1. Sandbox Dockerfile

File: docker/agent-loop/claude/Dockerfile

FROM docker/sandbox-templates:claude-code
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential jq openssh-client fd-find \
    && rm -rf /var/lib/apt/lists/*
USER agent

The docker/agent-loop/ directory uses <agent>/Dockerfile layout for future multi-agent support.

2. Image Build and Rebuild Detection

Image tag: agent-loop-sandbox-<agent>:<content-hash>

The content hash is derived from:

  • SHA256 of the Dockerfile contents
  • SHA256 of the base image digest (docker/sandbox-templates:claude-code)

If either changes, the image tag changes, triggering a rebuild.

Rebuild flow:

  1. Compute content hash from Dockerfile + base image digest
  2. Check if agent-loop-sandbox-<agent>:<hash> exists locally
  3. If missing: docker build -t agent-loop-sandbox-<agent>:<hash> docker/agent-loop/<agent>/
  4. Docker layer caching makes unchanged rebuilds near-instant

Base image update detection:

  • docker pull docker/sandbox-templates:claude-code (check for new digest)
  • Only pull if --rebuild flag or if base image is older than 7 days (check via docker inspect --format '{{.Created}}')
  • A new base digest changes the content hash, triggering rebuild

Manual rebuild: ralph --rebuild forces a fresh docker pull + docker build.

3. Sandbox Lifecycle

Naming convention: agent-loop-<agent>-<sanitized-branch> (e.g., agent-loop-claude-fix-auth)

Create

def ensure_sandbox(agent, branch, worktree_path):
    name = sandbox_name(agent, branch)
    # Check if sandbox exists
    existing = docker_sandbox_ls_json()
    for vm in existing["vms"]:
        if vm["name"] == name:
            return name  # reuse
    # Create with custom template
    tag = ensure_image(agent)
    docker_sandbox_create(name, tag, worktree_path)
    apply_network_policy(name)
    return name

Cleanup (on issue completion)

When ralph marks an issue status:done:

docker_sandbox_rm(sandbox_name(agent, branch))

Orphan pruning

ralph prune-sandboxes:

  1. List all sandboxes matching agent-loop-<agent>-*
  2. For each, check if the workspace path still exists on disk
  3. If not, docker sandbox rm it
  4. Report what was pruned

4. Network Policy

Applied immediately after sandbox creation:

docker sandbox network proxy <name> \
  --policy deny \
  --allow-host localhost \
  --allow-host api.anthropic.com \
  --allow-host statsig.anthropic.com \
  --allow-host sentry.io

Note: localhost must be allowed for the proxy connection via host.docker.internal.

5. Pre-flight Validation

Before starting a real iteration, ralph validates:

  1. Token valid: ralph check-token --agent <agent> exits 0
  2. Proxy running: curl -s http://localhost:<port>/health returns 200
  3. Sandbox responsive: docker sandbox exec <name> echo ok succeeds
  4. Network policy applied: docker sandbox exec <name> curl -s --max-time 3 https://google.com returns blocked message

If any check fails, ralph prints a diagnostic message and exits with an actionable error.

6. Execution via Sandbox

Replace docker.run_container() with:

docker sandbox exec \
  -e "CLAUDE_CODE_OAUTH_TOKEN=phantom" \
  -e "ANTHROPIC_BASE_URL=http://host.docker.internal:<port>" \
  -w <worktree_path> \
  <sandbox_name> \
  claude -p --dangerously-skip-permissions --model <model> <prompt>

Key differences from current docker run:

  • No --rm (sandbox persists)
  • No -v mounts (workspace synced automatically)
  • No SSH key mounting (sandbox has no push credentials)
  • No CLAUDE_CODE_OAUTH_TOKEN=<real> (phantom token only)
  • No GIT_USER/GIT_EMAIL (configured once inside sandbox)

Implementation Plan

Step 1: Create sandbox Dockerfile [DONE]

Files:

  • docker/agent-loop/claude/Dockerfile

Implement:

  1. Create Dockerfile based on docker/sandbox-templates:claude-code
  2. Install build tools as root, switch back to agent user

Test:

  • docker build -t agent-loop-sandbox-claude:test docker/agent-loop/claude/ succeeds
  • docker sandbox create --name test-sb -t agent-loop-sandbox-claude:test claude /tmp/test-dir succeeds
  • docker sandbox exec test-sb bash -c 'which gcc && which rg && claude --version' returns valid paths
  • Clean up: docker sandbox rm test-sb

Verify: Build and sandbox creation succeed.

Review: Minimal packages, correct USER directives.

Address feedback: Fix, re-test.

Step 2: Implement image build and rebuild detection [DONE]

Files:

  • scripts/ralph — add Sandbox class with ensure_image(), compute_tag(), needs_rebuild()

Implement:

  1. compute_tag(agent): hash Dockerfile contents + base image digest → agent-loop-sandbox-<agent>:<hash>
  2. ensure_image(agent): check if tag exists, build if not
  3. needs_rebuild(agent): check base image age, pull if stale
  4. --rebuild flag: force pull + build

Test:

  • tests/test_ralph.py (extend):
    • Content hash changes when Dockerfile changes
    • Content hash changes when base digest changes
    • ensure_image skips build when tag exists (mock docker image inspect)
    • --rebuild forces pull + build

Verify: pytest tests/test_ralph.py -v

Review: Hash computation is deterministic, caching logic correct.

Address feedback: Fix, re-test.

Step 3: Implement sandbox lifecycle management [DONE]

Files:

  • scripts/ralph — add ensure_sandbox(), cleanup_sandbox(), prune_sandboxes() to Sandbox class

Implement:

  1. sandbox_name(agent, branch): agent-loop-<agent>-<sanitized_branch>
  2. ensure_sandbox(agent, branch, worktree_path): check existing via docker sandbox ls --json, create if missing
  3. apply_network_policy(name): deny-by-default + allowed hosts
  4. cleanup_sandbox(agent, branch): docker sandbox rm
  5. prune_sandboxes(agent): list ralph sandboxes, remove those with non-existent workspace paths
  6. Add prune-sandboxes subcommand

Test:

  • tests/test_ralph.py:
    • Sandbox name generation with special characters in branch
    • ensure_sandbox reuses existing (mock docker sandbox ls)
    • ensure_sandbox creates new when missing
    • prune_sandboxes removes orphans, keeps active
    • Network policy command is correctly constructed

Verify: pytest tests/test_ralph.py -v

Review: Naming collisions, cleanup safety, policy correctness.

Address feedback: Fix, re-test.

Step 4: Implement pre-flight validation [DONE]

Files:

  • scripts/ralph — add preflight_check() to Sandbox class

Implement:

  1. Check token validity (call check_token)
  2. Check proxy health (HTTP GET to /health)
  3. Check sandbox responsive (docker sandbox exec ... echo ok)
  4. Check network policy (attempt blocked request)
  5. Return list of failures with actionable messages

Test:

  • tests/test_ralph.py:
    • All checks pass → returns empty list
    • Token expired → returns token error with instructions
    • Proxy down → returns proxy error
    • Sandbox unresponsive → returns sandbox error

Verify: pytest tests/test_ralph.py -v

Review: Error messages are actionable, checks don't have side effects.

Address feedback: Fix, re-test.

Step 5: Run all checks

Implement:

  1. Run pytest tests/ -v
  2. Run shellcheck and zsh -n on shell scripts
  3. Verify Docker build succeeds

Verify: All checks pass clean.

Step 6: Create commit

Implement:

  1. Stage all changes and create a commit summarizing Docker sandbox integration.

Verify: git log -1 shows the commit.


Conventions

  • Language: Python 3 (stdlib only)
  • Tests: pytest with unittest.mock
  • Error messages: Prefix with ralph:
  • Exit codes: 0=success, 1=runtime error, 2=usage error

Metadata

Metadata

Assignees

No one assigned

    Labels

    specRalph spec for automated executionstatus:doneCompleted

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions