Skip to content

Project-Level Sandbox Dependencies #28

@rjernst

Description

@rjernst

branch: project-sandbox-deps

Spec: Project-Level Sandbox Dependencies

Overview

Allow projects to declare additional packages or custom Dockerfile layers that get installed into the sandbox image. Dependencies are declared in a .agent-loop/ directory in the project root (not the worktree), so they are shared and cached across all worktrees for the same project.

Two modes are supported, with a clear priority:

  1. Dockerfile.sandbox (full control) — takes precedence if present
  2. dependencies (simple package list) — auto-generates a Dockerfile layer

Both use ARG BASE_IMAGE / FROM ${BASE_IMAGE} so they are agent-agnostic — ralph injects the correct base image via --build-arg at build time. The same project config works regardless of which agent is used.

Project-level images are tagged as agent-loop-sandbox-{agent}-{project}:v{hash} where {project} is derived from the repo directory name and {hash} is an 8-character content hash of the base image tag + project file content.

Additionally, the existing base image content hash is shortened from 12 to 8 characters for readability.

Architecture

Image layering:

  docker/sandbox-templates:claude-code     (upstream base)
           │
           ▼
  docker/agent-loop/claude/Dockerfile      (agent base — build-essential, jq, etc.)
           │
           ▼  (only if .agent-loop/ config exists in project)
  .agent-loop/Dockerfile.sandbox           (custom project layer)
       OR
  .agent-loop/dependencies                 (auto-generated project layer)
           │
           ▼
  agent-loop-sandbox-claude-myproject:v{hash}   (final image used by sandbox)
Lookup flow in ensure_sandbox():

  1. Build/cache base image (existing logic, unchanged)
  2. Resolve project_dir = repo root (git rev-parse --show-toplevel)
  3. Check {project_dir}/.agent-loop/Dockerfile.sandbox → if exists, use it
  4. Else check {project_dir}/.agent-loop/dependencies → generate Dockerfile
  5. Else → use base image directly (no project layer)
  6. Content hash = SHA256(base_tag + project_file_content)[:8]
  7. If image tag exists → reuse cached; else build with --build-arg BASE_IMAGE=<base_tag>
File locations:

  <project-root>/
    .agent-loop/
      dependencies          # Option A: one apt package per line, # comments
      Dockerfile.sandbox    # Option B: full Dockerfile (takes precedence)

1. Content Hash Length

Shorten the content hash from 12 to 8 characters throughout. Applies to both base and project image tags.

Current: agent-loop-sandbox-claude:v3a8f2b1c9d0e
New: agent-loop-sandbox-claude:v3a8f2b1c

2. Dependencies File Format

Plain text list of apt packages:

# .agent-loop/dependencies
# One package per line. Comments and blank lines are ignored.
openjdk-21-jdk
python3-venv
nodejs

Parsing rules:

  • Strip # comments (including inline # comment after a package name)
  • Strip leading/trailing whitespace
  • Skip empty lines
  • Remaining lines are package names passed to apt-get install -y --no-install-recommends

3. Dockerfile.sandbox Format

A standard Dockerfile using ARG BASE_IMAGE:

ARG BASE_IMAGE
FROM ${BASE_IMAGE}
USER root
RUN apt-get update && apt-get install -y --no-install-recommends openjdk-21-jdk \
    && rm -rf /var/lib/apt/lists/*
USER agent

Ralph builds with: docker build --build-arg BASE_IMAGE=<base_tag> -t <project_tag> -f Dockerfile.sandbox <context_dir>

The build context is the .agent-loop/ directory, so files within it can be COPY'd into the image if needed.

4. Generated Dockerfile (from dependencies)

When dependencies exists but Dockerfile.sandbox does not, ralph generates a Dockerfile in memory:

ARG BASE_IMAGE
FROM ${BASE_IMAGE}
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    pkg1 pkg2 pkg3 \
    && rm -rf /var/lib/apt/lists/*
USER agent

Written to a temp file only during docker build, then cleaned up.

5. Project Image Tagging

Tag format: agent-loop-sandbox-{agent}-{project}:v{hash}

  • {agent} — agent name (e.g., claude)
  • {project}os.path.basename(project_dir) (e.g., elasticsearch)
  • {hash}SHA256(base_tag + project_dockerfile_content)[:8]

The hash incorporates the base image tag (not just digest), so a base image rebuild cascades to project image rebuilds.

6. --rebuild Flag

--rebuild forces everything: re-pull upstream base, rebuild agent base image, rebuild project layer. No separate flag needed — content-addressed caching handles dependency file changes automatically.

7. Selftest Enhancement

When ralph selftest is run from a directory containing .agent-loop/dependencies or .agent-loop/Dockerfile.sandbox, add a check that verifies the project image builds successfully. Report as "build project image" with the project tag.


Implementation Plan

Each step follows this structure:

  1. Implement — Write the code
  2. Test — Write pytest tests
  3. Verify — Run tests, fix failures until all pass
  4. Review — Code review for bugs, edge cases, and conventions
  5. Address feedback — Fix review findings, re-run tests, re-review until clean
  6. Update spec — Mark the step [done] and record any decisions or deviations

Spec maintenance rules

  • Mark each step [done] when complete.
  • Record design decisions that emerged during implementation as notes under the step.
  • Minor deviations (e.g. method name changes, reordered logic) should be noted and the spec updated to match.
  • Significant design changes (e.g. new subcommands, changed architecture, removed features) require pausing for user review before proceeding.

Step 1: Shorten content hash to 8 characters [done]

Files:

  • scripts/ralphSandbox.content_hash() method
  • tests/test_ralph.pyTestSandboxContentHash, TestSandboxEnsureImage, and any other tests asserting tag format

Implement:

  1. In Sandbox.content_hash(), change [:12] to [:8]

Test:

  • Update TestSandboxContentHash assertions for 8-char hashes
  • Update any TestSandboxEnsureImage assertions that check tag length
  • Search for any other tests asserting 12-char hashes and update them

Verify: Run pytest tests/test_ralph.py -v. Fix any failures and re-run until all pass.

Review: Ensure no hardcoded 12-char hash assumptions remain anywhere.

Address feedback: Fix all review findings. Re-run tests. Re-review if changes were substantial.

Step 2: Add dependencies file parsing [done]

Files:

  • scripts/ralph — New static method Sandbox.parse_dependencies(content)
  • tests/test_ralph.py — New TestSandboxParseDependencies class

Implement:

  1. Add Sandbox.parse_dependencies(content) static method that:
    • Splits content into lines
    • Strips inline # comments (everything from # to end of line)
    • Strips whitespace
    • Skips empty lines
    • Returns a list of package names

Test:

  • Basic package list parsing
  • Comment-only lines skipped
  • Inline comments stripped (pkg # commentpkg)
  • Blank lines skipped
  • Whitespace handling (leading/trailing)
  • Empty content returns empty list

Verify: Run pytest tests/test_ralph.py::TestSandboxParseDependencies -v.

Review: Edge cases in comment parsing.

Address feedback: Fix all review findings. Re-run tests. Re-review if changes were substantial.

Step 3: Add project Dockerfile generation [done]

Files:

  • scripts/ralph — New static method Sandbox.generate_project_dockerfile(packages)
  • tests/test_ralph.py — New TestSandboxGenerateProjectDockerfile class

Implement:

  1. Add Sandbox.generate_project_dockerfile(packages) static method that:
    • Takes a list of package names
    • Returns a Dockerfile string with ARG BASE_IMAGE, FROM ${BASE_IMAGE}, USER root, apt-get update && install, USER agent
    • Joins packages with spaces in a single apt-get install line

Test:

  • Single package generates correct Dockerfile
  • Multiple packages joined correctly
  • Output contains ARG BASE_IMAGE and FROM ${BASE_IMAGE}
  • Output switches to USER root then back to USER agent
  • Output includes --no-install-recommends and rm -rf /var/lib/apt/lists/*

Verify: Run pytest tests/test_ralph.py::TestSandboxGenerateProjectDockerfile -v.

Review: Dockerfile best practices (layer ordering, cleanup).

Address feedback: Fix all review findings. Re-run tests. Re-review if changes were substantial.

Step 4: Add project image detection and building [done]

Files:

  • scripts/ralph — New methods: Sandbox.find_project_config(project_dir), Sandbox.project_image_tag(agent, project_name, base_tag, config_content), Sandbox.ensure_project_image(agent, base_tag, project_dir, force_rebuild)
  • tests/test_ralph.py — New test classes for each method

Implement:

  1. Sandbox.find_project_config(project_dir) — static method that checks for .agent-loop/Dockerfile.sandbox then .agent-loop/dependencies. Returns a tuple (type, path) where type is "dockerfile", "dependencies", or None if neither exists.
  2. Sandbox.project_image_tag(agent, project_name, base_tag, config_content) — static method that computes content hash from base_tag + config_content and returns agent-loop-sandbox-{agent}-{project_name}:v{hash}.
  3. Sandbox.ensure_project_image(agent, base_tag, project_dir, force_rebuild=False):
    • Calls find_project_config(project_dir) — if None, returns base_tag
    • Reads the config file content
    • If type is "dependencies", parses packages and generates Dockerfile content
    • If type is "dockerfile", reads the file content directly
    • Computes project tag via project_image_tag()
    • If tag exists locally and not force_rebuild, return cached tag
    • Otherwise: for "dockerfile", build with .agent-loop/ as build context and -f Dockerfile.sandbox; for "dependencies", write generated Dockerfile to a temp file and build with temp dir as context
    • Always pass --build-arg BASE_IMAGE=<base_tag>
    • Return the project tag

Test:

  • find_project_config: prefers Dockerfile.sandbox over dependencies, returns None when neither exists, returns correct type and path for each
  • project_image_tag: includes project name and agent in tag, hash is 8 chars, different content produces different hash, same content produces same hash
  • ensure_project_image: returns base tag when no project config, builds when tag missing, skips build when tag exists, force_rebuild forces build, passes --build-arg BASE_IMAGE correctly, uses .agent-loop/ as build context for Dockerfile.sandbox, uses -f Dockerfile.sandbox for custom Dockerfiles

Verify: Run pytest tests/test_ralph.py -v -k "project".

Review: Temp file cleanup, build context correctness, content hash determinism.

Address feedback: Fix all review findings. Re-run tests. Re-review if changes were substantial.

Step 5: Wire project images into ensure_sandbox and process_issue [done]

Files:

  • scripts/ralphSandbox.ensure_sandbox(), process_issue(), main()
  • tests/test_ralph.py — Update TestSandboxEnsureSandbox, TestProcessIssueSandbox, TestMainSandboxFlags

Implement:

  1. Update Sandbox.ensure_sandbox() signature to accept project_dir=None and force_rebuild=False
  2. Inside ensure_sandbox(): after ensure_image(agent) returns base_tag, call ensure_project_image(agent, base_tag, project_dir, force_rebuild) to get final_tag. Use final_tag for sandbox creation.
  3. In process_issue(), resolve repo_root via git.output("rev-parse", "--show-toplevel") and pass it to ensure_sandbox() as project_dir. Pass force_rebuild through from caller.
  4. Remove the early sandbox.ensure_image(agent, force_rebuild=rebuild) call from main() (line 1721) since image building now happens inside ensure_sandbox() which has the project context.
  5. Pass force_rebuild=rebuild through process_issue() and poll_loop() to ensure_sandbox().

Test:

  • TestSandboxEnsureSandbox: verify ensure_project_image is called with correct args, final tag used for sandbox creation
  • TestProcessIssueSandbox: verify repo_root resolved and passed through
  • TestMainSandboxFlags: verify --rebuild still forces rebuild of both layers

Verify: Run pytest tests/test_ralph.py -v. Fix any failures.

Review: Ensure no code path bypasses project image resolution. Verify --rebuild cascades correctly through both layers.

Address feedback: Fix all review findings. Re-run tests. Re-review if changes were substantial.

Step 6: Enhance selftest for project images [done]

Files:

  • scripts/ralphselftest() function
  • tests/test_ralph.pyTestSelftest class

Implement:

  1. After the existing "build image" check in selftest(), detect if the current directory has .agent-loop/ config via Sandbox.find_project_config(os.getcwd())
  2. If project config exists, add a check that builds the project image: call ensure_project_image(agent, base_tag, os.getcwd())
  3. Report as "build project image" with the project tag on success

Test:

  • selftest reports project image check when .agent-loop/dependencies exists in cwd
  • selftest reports project image check when .agent-loop/Dockerfile.sandbox exists in cwd
  • selftest skips project image check when no .agent-loop/ directory in cwd
  • selftest reports failure when project image build fails

Verify: Run pytest tests/test_ralph.py::TestSelftest -v.

Review: Ensure selftest cleanup handles project images correctly.

Address feedback: Fix all review findings. Re-run tests. Re-review if changes were substantial.

Step 7: Run all checks [done]

Implement:

  1. Run the full test suite: pytest tests/test_ralph.py -v
  2. Run python3 -c "import py_compile; py_compile.compile('scripts/ralph', doraise=True)" to verify syntax
  3. Fix any failures

Verify: All checks pass clean.

Step 8: Create commit

Implement:

  1. Stage all changes and create a commit with a descriptive message summarizing the feature.

Verify: git log -1 shows the commit.


Conventions

  • Language: Python 3 (stdlib only, no third-party dependencies)
  • Tests: pytest with unittest.mock.patch, tmp_path fixture for filesystem isolation
  • Error messages: Prefix with ralph:
  • Exit codes: 0=success, 1=runtime error
  • Imports: Use from conftest import import_script in tests to import the extensionless ralph script
  • Docker: Content-addressed image tags, subprocess.run for all Docker CLI calls

Metadata

Metadata

Assignees

No one assigned

    Labels

    specRalph spec for automated executionstatus:doneCompleted

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions