Ralph Docker Sandbox Integration

---
branch: ralph-docker-sandbox
depends: [24, 25]
---
# Spec: Ralph Docker Sandbox Integration

Source issues: #24 (credential injection proxy), #25 (token management)

## Overview

Replace ralph's plain `docker run` execution with Docker sandbox for microVM-level isolation. This includes: a custom Dockerfile for the sandbox template (build tools + Claude Code), sandbox lifecycle management (create, reuse, cleanup), network policy enforcement (deny-by-default), and image rebuild detection.

The sandbox ensures Claude Code runs in a hardened environment: no SSH keys, no git push credentials, network restricted to the Anthropic API (via the credential injection proxy) only. Ralph handles all git operations on the host.

Designed with multi-agent namespacing: sandbox names, Dockerfiles, and network policies are parameterized by agent type. Default agent is `claude`.

## Architecture

```
┌─ ralph (host) ──────────────────────────────────────────┐
│                                                          │
│  1. Ensure image: docker build (layer-cached)            │
│     docker/agent-loop/claude/Dockerfile               │
│     FROM docker/sandbox-templates:claude-code             │
│     + build-essential, jq, ripgrep, fd-find, openssh     │
│                                                          │
│  2. Ensure sandbox: docker sandbox create                │
│     --name agent-loop-claude-<branch>                         │
│     -t agent-loop-sandbox-claude:v<hash>                      │
│     claude /path/to/worktree                             │
│                                                          │
│  3. Apply network policy:                                │
│     --policy deny                                        │
│     --allow-host localhost (for proxy)                   │
│     --allow-host api.anthropic.com                       │
│     --allow-host statsig.anthropic.com                   │
│     --allow-host sentry.io                               │
│                                                          │
│  4. Run claude via exec:                                 │
│     docker sandbox exec                                  │
│       -e CLAUDE_CODE_OAUTH_TOKEN=phantom                 │
│       -e ANTHROPIC_BASE_URL=http://host.docker.internal:<port> │
│       agent-loop-claude-<branch>                              │
│       claude -p --dangerously-skip-permissions ...        │
│                                                          │
│  5. Cleanup: docker sandbox rm agent-loop-claude-<branch>     │
└──────────────────────────────────────────────────────────┘
```

## 1. Sandbox Dockerfile

**File:** `docker/agent-loop/claude/Dockerfile`

```dockerfile
FROM docker/sandbox-templates:claude-code
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential jq openssh-client fd-find \
    && rm -rf /var/lib/apt/lists/*
USER agent
```

The `docker/agent-loop/` directory uses `<agent>/Dockerfile` layout for future multi-agent support.

## 2. Image Build and Rebuild Detection

**Image tag:** `agent-loop-sandbox-<agent>:<content-hash>`

The content hash is derived from:
- SHA256 of the Dockerfile contents
- SHA256 of the base image digest (`docker/sandbox-templates:claude-code`)

If either changes, the image tag changes, triggering a rebuild.

**Rebuild flow:**
1. Compute content hash from Dockerfile + base image digest
2. Check if `agent-loop-sandbox-<agent>:<hash>` exists locally
3. If missing: `docker build -t agent-loop-sandbox-<agent>:<hash> docker/agent-loop/<agent>/`
4. Docker layer caching makes unchanged rebuilds near-instant

**Base image update detection:**
- `docker pull docker/sandbox-templates:claude-code` (check for new digest)
- Only pull if `--rebuild` flag or if base image is older than 7 days (check via `docker inspect --format '{{.Created}}'`)
- A new base digest changes the content hash, triggering rebuild

**Manual rebuild:** `ralph --rebuild` forces a fresh `docker pull` + `docker build`.

## 3. Sandbox Lifecycle

**Naming convention:** `agent-loop-<agent>-<sanitized-branch>` (e.g., `agent-loop-claude-fix-auth`)

### Create
```python
def ensure_sandbox(agent, branch, worktree_path):
    name = sandbox_name(agent, branch)
    # Check if sandbox exists
    existing = docker_sandbox_ls_json()
    for vm in existing["vms"]:
        if vm["name"] == name:
            return name  # reuse
    # Create with custom template
    tag = ensure_image(agent)
    docker_sandbox_create(name, tag, worktree_path)
    apply_network_policy(name)
    return name
```

### Cleanup (on issue completion)
When ralph marks an issue `status:done`:
```python
docker_sandbox_rm(sandbox_name(agent, branch))
```

### Orphan pruning
`ralph prune-sandboxes`:
1. List all sandboxes matching `agent-loop-<agent>-*`
2. For each, check if the workspace path still exists on disk
3. If not, `docker sandbox rm` it
4. Report what was pruned

## 4. Network Policy

Applied immediately after sandbox creation:
```bash
docker sandbox network proxy <name> \
  --policy deny \
  --allow-host localhost \
  --allow-host api.anthropic.com \
  --allow-host statsig.anthropic.com \
  --allow-host sentry.io
```

Note: `localhost` must be allowed for the proxy connection via `host.docker.internal`.

## 5. Pre-flight Validation

Before starting a real iteration, ralph validates:
1. **Token valid:** `ralph check-token --agent <agent>` exits 0
2. **Proxy running:** `curl -s http://localhost:<port>/health` returns 200
3. **Sandbox responsive:** `docker sandbox exec <name> echo ok` succeeds
4. **Network policy applied:** `docker sandbox exec <name> curl -s --max-time 3 https://google.com` returns blocked message

If any check fails, ralph prints a diagnostic message and exits with an actionable error.

## 6. Execution via Sandbox

Replace `docker.run_container()` with:
```python
docker sandbox exec \
  -e "CLAUDE_CODE_OAUTH_TOKEN=phantom" \
  -e "ANTHROPIC_BASE_URL=http://host.docker.internal:<port>" \
  -w <worktree_path> \
  <sandbox_name> \
  claude -p --dangerously-skip-permissions --model <model> <prompt>
```

Key differences from current `docker run`:
- No `--rm` (sandbox persists)
- No `-v` mounts (workspace synced automatically)
- No SSH key mounting (sandbox has no push credentials)
- No `CLAUDE_CODE_OAUTH_TOKEN=<real>` (phantom token only)
- No `GIT_USER`/`GIT_EMAIL` (configured once inside sandbox)

---

## Implementation Plan

### Step 1: Create sandbox Dockerfile [DONE]

**Files:**
- `docker/agent-loop/claude/Dockerfile`

**Implement:**
1. Create Dockerfile based on `docker/sandbox-templates:claude-code`
2. Install build tools as root, switch back to agent user

**Test:**
- `docker build -t agent-loop-sandbox-claude:test docker/agent-loop/claude/` succeeds
- `docker sandbox create --name test-sb -t agent-loop-sandbox-claude:test claude /tmp/test-dir` succeeds
- `docker sandbox exec test-sb bash -c 'which gcc && which rg && claude --version'` returns valid paths
- Clean up: `docker sandbox rm test-sb`

**Verify:** Build and sandbox creation succeed.

**Review:** Minimal packages, correct USER directives.

**Address feedback:** Fix, re-test.

### Step 2: Implement image build and rebuild detection [DONE]

**Files:**
- `scripts/ralph` — add `Sandbox` class with `ensure_image()`, `compute_tag()`, `needs_rebuild()`

**Implement:**
1. `compute_tag(agent)`: hash Dockerfile contents + base image digest → `agent-loop-sandbox-<agent>:<hash>`
2. `ensure_image(agent)`: check if tag exists, build if not
3. `needs_rebuild(agent)`: check base image age, pull if stale
4. `--rebuild` flag: force pull + build

**Test:**
- `tests/test_ralph.py` (extend):
  - Content hash changes when Dockerfile changes
  - Content hash changes when base digest changes
  - `ensure_image` skips build when tag exists (mock `docker image inspect`)
  - `--rebuild` forces pull + build

**Verify:** `pytest tests/test_ralph.py -v`

**Review:** Hash computation is deterministic, caching logic correct.

**Address feedback:** Fix, re-test.

### Step 3: Implement sandbox lifecycle management [DONE]

**Files:**
- `scripts/ralph` — add `ensure_sandbox()`, `cleanup_sandbox()`, `prune_sandboxes()` to `Sandbox` class

**Implement:**
1. `sandbox_name(agent, branch)`: `agent-loop-<agent>-<sanitized_branch>`
2. `ensure_sandbox(agent, branch, worktree_path)`: check existing via `docker sandbox ls --json`, create if missing
3. `apply_network_policy(name)`: deny-by-default + allowed hosts
4. `cleanup_sandbox(agent, branch)`: `docker sandbox rm`
5. `prune_sandboxes(agent)`: list ralph sandboxes, remove those with non-existent workspace paths
6. Add `prune-sandboxes` subcommand

**Test:**
- `tests/test_ralph.py`:
  - Sandbox name generation with special characters in branch
  - `ensure_sandbox` reuses existing (mock `docker sandbox ls`)
  - `ensure_sandbox` creates new when missing
  - `prune_sandboxes` removes orphans, keeps active
  - Network policy command is correctly constructed

**Verify:** `pytest tests/test_ralph.py -v`

**Review:** Naming collisions, cleanup safety, policy correctness.

**Address feedback:** Fix, re-test.

### Step 4: Implement pre-flight validation [DONE]

**Files:**
- `scripts/ralph` — add `preflight_check()` to `Sandbox` class

**Implement:**
1. Check token validity (call `check_token`)
2. Check proxy health (HTTP GET to `/health`)
3. Check sandbox responsive (`docker sandbox exec ... echo ok`)
4. Check network policy (attempt blocked request)
5. Return list of failures with actionable messages

**Test:**
- `tests/test_ralph.py`:
  - All checks pass → returns empty list
  - Token expired → returns token error with instructions
  - Proxy down → returns proxy error
  - Sandbox unresponsive → returns sandbox error

**Verify:** `pytest tests/test_ralph.py -v`

**Review:** Error messages are actionable, checks don't have side effects.

**Address feedback:** Fix, re-test.

### Step 5: Run all checks

**Implement:**
1. Run `pytest tests/ -v`
2. Run `shellcheck` and `zsh -n` on shell scripts
3. Verify Docker build succeeds

**Verify:** All checks pass clean.

### Step 6: Create commit

**Implement:**
1. Stage all changes and create a commit summarizing Docker sandbox integration.

**Verify:** `git log -1` shows the commit.

---

## Conventions

- **Language:** Python 3 (stdlib only)
- **Tests:** pytest with unittest.mock
- **Error messages:** Prefix with `ralph:`
- **Exit codes:** 0=success, 1=runtime error, 2=usage error




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ralph Docker Sandbox Integration #26

branch: ralph-docker-sandbox
depends: [24, 25]

Spec: Ralph Docker Sandbox Integration

Overview

Architecture

1. Sandbox Dockerfile

2. Image Build and Rebuild Detection

3. Sandbox Lifecycle

Create

Cleanup (on issue completion)

Orphan pruning

4. Network Policy

5. Pre-flight Validation

6. Execution via Sandbox

Implementation Plan

Step 1: Create sandbox Dockerfile [DONE]

Step 2: Implement image build and rebuild detection [DONE]

Step 3: Implement sandbox lifecycle management [DONE]

Step 4: Implement pre-flight validation [DONE]

Step 5: Run all checks

Step 6: Create commit

Conventions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ralph Docker Sandbox Integration #26

Description

branch: ralph-docker-sandbox depends: [24, 25]

Spec: Ralph Docker Sandbox Integration

Overview

Architecture

1. Sandbox Dockerfile

2. Image Build and Rebuild Detection

3. Sandbox Lifecycle

Create

Cleanup (on issue completion)

Orphan pruning

4. Network Policy

5. Pre-flight Validation

6. Execution via Sandbox

Implementation Plan

Step 1: Create sandbox Dockerfile [DONE]

Step 2: Implement image build and rebuild detection [DONE]

Step 3: Implement sandbox lifecycle management [DONE]

Step 4: Implement pre-flight validation [DONE]

Step 5: Run all checks

Step 6: Create commit

Conventions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

branch: ralph-docker-sandbox
depends: [24, 25]