Ralph Sandbox Migration

---
branch: ralph-sandbox-migration
depends: [24, 25, 26]
---
# Spec: Ralph Sandbox Migration

Source issues: #24 (proxy), #25 (token management), #26 (sandbox integration)

## Overview

Wire together the credential injection proxy (#24), token management (#25), and Docker sandbox (#26) into the ralph orchestration script, replacing the current `docker run` approach. This spec handles the end-to-end flow: ralph starts the proxy, creates/reuses a sandbox, runs Claude via `docker sandbox exec`, and cleans up.

Also adds a pre-run smoke test (`ralph selftest`) that validates the entire pipeline without running a real spec, and an integration test that exercises the full flow against a mock project.

## Architecture

```
ralph --issue 42
  │
  ├─ 1. check-token (exits if expired)
  │
  ├─ 2. Start proxy container
  │     get-token | docker run -i --rm agent-loop-proxy
  │
  ├─ 3. Ensure worktree (existing logic)
  │
  ├─ 4. Ensure sandbox (create or reuse)
  │     docker sandbox create -t agent-loop-sandbox-claude:<hash>
  │     Apply network policy
  │
  ├─ 5. Pre-flight validation
  │     Proxy health, sandbox responsive, network policy
  │
  ├─ 6. Run iteration(s)
  │     docker sandbox exec -e phantom -e BASE_URL claude -p ...
  │     Check for commits, update issue, loop
  │
  ├─ 7. Push (if --push, from host)
  │
  └─ 8. Cleanup
        Stop proxy container
        Sandbox persists (cleaned on issue done or prune)
```

## 1. Proxy Lifecycle

Ralph manages the proxy container as a subprocess/background container:

**Start (once per ralph invocation):**
```python
token = get_token(agent)  # from Keychain
proxy_proc = start_proxy_container(agent, port, token)
# token piped via stdin, container runs in background
```

**Stop (on ralph exit):**
```python
stop_proxy_container(agent)
# docker stop + docker rm (or --rm handles it)
```

Proxy container name: `agent-loop-proxy-<agent>` (e.g., `agent-loop-proxy-claude`)

Port allocation: use a fixed port per agent (e.g., claude=18080). If port is in use, ralph detects and reuses the existing proxy (check health endpoint first).

## 2. Replace Docker Class

Replace the existing `Docker` class with a new `Sandbox` class (from #26). Key method changes:

| Old (Docker class) | New (Sandbox class) |
|---|---|
| `build_image(tag)` | `ensure_image(agent)` |
| `compute_tag(packages)` | `compute_tag(agent)` — hash-based |
| `ensure_image(packages)` | `ensure_image(agent)` — with rebuild detection |
| `run_container(...)` | `run_iteration(sandbox_name, worktree, prompt, model)` |

The `run_iteration` method uses `docker sandbox exec` instead of `docker run`.

## 3. Replace Auth Flow

| Old | New |
|---|---|
| `get_auth_token()` reads Keychain `"Claude Code-credentials"` | `check_token(agent)` validates, `get_token(agent)` extracts from `"claude-token"` |
| `CLAUDE_CODE_OAUTH_TOKEN=<real>` passed to container | `CLAUDE_CODE_OAUTH_TOKEN=phantom` + `ANTHROPIC_BASE_URL=proxy` |
| Token visible inside container | Token never enters sandbox |

## 4. Remove Old Docker Infrastructure

After migration:
- Remove `docker/ralph/Dockerfile` (replaced by `docker/agent-loop/claude/Dockerfile`)
- Remove `docker/ralph/entrypoint.sh` (no entrypoint needed — sandbox uses `sleep infinity`)
- Remove `Docker` class from `scripts/ralph`
- Remove `--packages` flag (packages now in Dockerfile, rebuild via `--rebuild`)
- Update CLAUDE.md references

## 5. Selftest Command

`ralph selftest [--agent <name>]`:

1. Check token: `check-token --agent <agent>`
2. Start proxy: pipe token, verify health endpoint
3. Build image: `ensure_image(agent)`
4. Create test sandbox: `docker sandbox create --name agent-loop-selftest-<agent> ...`
5. Apply network policy
6. Verify proxy reachable from sandbox: `docker sandbox exec ... curl proxy health`
7. Verify Claude auth works: `docker sandbox exec ... claude -p "say ok"` (through proxy)
8. Verify network isolation: `docker sandbox exec ... curl google.com` returns blocked
9. Cleanup: remove test sandbox, stop proxy
10. Report: all checks passed / which failed with diagnostics

## 6. Integration Test

`tests/test_ralph_integration.py` — a pytest test that runs the selftest flow programmatically:

- Skipped by default (requires Docker Desktop running + valid token)
- Enabled with `pytest -m integration` or env var `RALPH_INTEGRATION_TESTS=1`
- Tests the real Docker sandbox + proxy + auth flow
- Validates: image build, sandbox create, proxy connectivity, claude execution, network isolation, cleanup
- Timeout: 120 seconds per test

---

## Implementation Plan

### Step 1: Implement proxy lifecycle in ralph [DONE]

**Files:**
- `scripts/ralph` — add `start_proxy()`, `stop_proxy()`, `ensure_proxy()`

**Implement:**
1. `start_proxy(agent, port)`: read token from Keychain, pipe to `docker run -i --rm -d --name agent-loop-proxy-<agent> -p <port>:18080 agent-loop-proxy:v1`
2. `stop_proxy(agent)`: `docker stop agent-loop-proxy-<agent>`
3. `ensure_proxy(agent, port)`: check if proxy running (health endpoint), start if not, reuse if healthy
4. Register `atexit` handler to stop proxy on ralph exit

**Test:**
- `tests/test_ralph.py`:
  - `start_proxy` constructs correct docker run command (mocked)
  - `ensure_proxy` reuses healthy proxy (mock health check)
  - `ensure_proxy` starts new when none running
  - `stop_proxy` calls docker stop

**Verify:** `pytest tests/test_ralph.py -v`

**Review:** Cleanup on crash/signal, port conflicts, container naming.

**Address feedback:** Fix, re-test.

### Step 2: Replace Docker class with Sandbox class [DONE]

**Files:**
- `scripts/ralph` — replace `Docker` class, update `process_issue()` and `poll_loop()`

**Implement:**
1. Replace `Docker` class with `Sandbox` class (from #26)
2. Update `process_issue()`: use `ensure_sandbox()` + `docker sandbox exec` instead of `docker.run_container()`
3. Update `poll_loop()`: same changes
4. Update `main()`: instantiate `Sandbox` instead of `Docker`, start/stop proxy
5. Remove `--packages` flag
6. Add `--rebuild` flag (forces image rebuild)
7. Add `--agent` flag (default: `claude`)

**Test:**
- `tests/test_ralph.py`:
  - `process_issue` uses sandbox exec (mock subprocess calls)
  - `--rebuild` triggers image rebuild
  - `--agent codex` uses correct namespacing throughout
  - Existing pure-function tests still pass

**Verify:** `pytest tests/test_ralph.py -v`

**Review:** All Docker class references removed, no regressions in existing behavior.

**Address feedback:** Fix, re-test.

### Step 3: Replace auth flow [DONE]

**Files:**
- `scripts/ralph` — update auth handling in `main()` and `process_issue()`

**Implement:**
1. Replace `get_auth_token()` with `check_token(agent)` + `get_token(agent)` (from #25)
2. Proxy receives real token via stdin
3. Sandbox exec uses `CLAUDE_CODE_OAUTH_TOKEN=phantom` + `ANTHROPIC_BASE_URL`
4. Remove old `get_auth_token()` function
5. Remove `CLAUDE_CODE_OAUTH_TOKEN` from container env (replaced by phantom)

**Test:**
- `tests/test_ralph.py`:
  - `process_issue` passes phantom token to sandbox exec
  - `ANTHROPIC_BASE_URL` set to proxy address
  - Old `get_auth_token` no longer called

**Verify:** `pytest tests/test_ralph.py -v`

**Review:** No real token in sandbox env, proxy address correct.

**Address feedback:** Fix, re-test.

### Step 4: Remove old Docker infrastructure [DONE]

**Files:**
- Delete `docker/ralph/Dockerfile`
- Delete `docker/ralph/entrypoint.sh`
- `scripts/ralph` — remove `Docker` class (if not already removed in step 2)
- Update `CLAUDE.md` — update architecture docs, directory structure, command reference

**Implement:**
1. Delete old files
2. Update CLAUDE.md: directory structure, ralph command reference, Docker section
3. Update tests: remove `test_ralph_entrypoint.bats` (no entrypoint to test)

**Test:**
- Old entrypoint tests removed
- All remaining tests pass

**Verify:** `pytest tests/ -v && bats tests/test_ralph.bats`

**Review:** No dangling references to old Docker infrastructure.

**Address feedback:** Fix, re-test.

### Step 5: Implement selftest command [DONE]

**Files:**
- `scripts/ralph` — add `selftest(agent)` function and `selftest` subcommand

**Implement:**
1. Implement `selftest(agent)` as described in section 5
2. Add `selftest` subcommand routing in `main()`
3. Each check prints pass/fail with diagnostics
4. Exit 0 if all pass, 1 if any fail

**Test:**
- `tests/test_ralph.py`:
  - `selftest` calls each check in order (mocked)
  - Reports correct pass/fail for each check
  - Cleans up test sandbox even on failure

**Verify:** `pytest tests/test_ralph.py -v`

**Review:** Cleanup is robust (try/finally), error messages are actionable.

**Address feedback:** Fix, re-test.

### Step 6: Add integration test [DONE]

**Files:**
- `tests/test_ralph_integration.py` — end-to-end test

**Implement:**
1. pytest marker: `@pytest.mark.integration`
2. Skip unless `RALPH_INTEGRATION_TESTS=1` env var set
3. Test: build image, start proxy, create sandbox, run claude through proxy, verify network isolation, cleanup
4. Timeout: 120 seconds
5. Cleanup in fixture finalizer (always runs)

**Test:**
- Run with `RALPH_INTEGRATION_TESTS=1 pytest tests/test_ralph_integration.py -v`

**Verify:** Integration test passes in local environment with Docker Desktop running.

**Review:** Test isolation, cleanup reliability, timeout handling.

**Address feedback:** Fix, re-test.

### Step 7: Run all checks

**Implement:**
1. Run `pytest tests/ -v` (excluding integration unless opted in)
2. Run `bats tests/test_ralph.bats` (if still relevant)
3. Run `shellcheck` and `zsh -n` on shell scripts
4. Verify Docker builds succeed

**Verify:** All checks pass clean.

### Step 8: Create commit

**Implement:**
1. Stage all changes and create a commit summarizing the sandbox migration.

**Verify:** `git log -1` shows the commit.

---

## Conventions

- **Language:** Python 3 (stdlib only)
- **Tests:** pytest with unittest.mock; integration tests with `@pytest.mark.integration`
- **Error messages:** Prefix with `ralph:`
- **Exit codes:** 0=success, 1=runtime error, 2=usage error






Old (Docker class)	New (Sandbox class)
`build_image(tag)`	`ensure_image(agent)`
`compute_tag(packages)`	`compute_tag(agent)` — hash-based
`ensure_image(packages)`	`ensure_image(agent)` — with rebuild detection
`run_container(...)`	`run_iteration(sandbox_name, worktree, prompt, model)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ralph Sandbox Migration #27

branch: ralph-sandbox-migration
depends: [24, 25, 26]

Spec: Ralph Sandbox Migration

Overview

Architecture

1. Proxy Lifecycle

2. Replace Docker Class

3. Replace Auth Flow

4. Remove Old Docker Infrastructure

5. Selftest Command

6. Integration Test

Implementation Plan

Step 1: Implement proxy lifecycle in ralph [DONE]

Step 2: Replace Docker class with Sandbox class [DONE]

Step 3: Replace auth flow [DONE]

Step 4: Remove old Docker infrastructure [DONE]

Step 5: Implement selftest command [DONE]

Step 6: Add integration test [DONE]

Step 7: Run all checks

Step 8: Create commit

Conventions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Old	New
`get_auth_token()` reads Keychain `"Claude Code-credentials"`	`check_token(agent)` validates, `get_token(agent)` extracts from `"claude-token"`
`CLAUDE_CODE_OAUTH_TOKEN=<real>` passed to container	`CLAUDE_CODE_OAUTH_TOKEN=phantom` + `ANTHROPIC_BASE_URL=proxy`
Token visible inside container	Token never enters sandbox

Ralph Sandbox Migration #27

Description

branch: ralph-sandbox-migration depends: [24, 25, 26]

Spec: Ralph Sandbox Migration

Overview

Architecture

1. Proxy Lifecycle

2. Replace Docker Class

3. Replace Auth Flow

4. Remove Old Docker Infrastructure

5. Selftest Command

6. Integration Test

Implementation Plan

Step 1: Implement proxy lifecycle in ralph [DONE]

Step 2: Replace Docker class with Sandbox class [DONE]

Step 3: Replace auth flow [DONE]

Step 4: Remove old Docker infrastructure [DONE]

Step 5: Implement selftest command [DONE]

Step 6: Add integration test [DONE]

Step 7: Run all checks

Step 8: Create commit

Conventions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

branch: ralph-sandbox-migration
depends: [24, 25, 26]