agentic-bash: Embedded Go Sandbox Library for AI Agents

What We're Building

A pure Go embeddable library that gives AI agents a stateful, isolated execution environment — filesystem, shell, and CLI — without requiring Docker, root access, or external daemons. Think just-bash but in Go, designed to be import-ed directly into any AI agent.

Reference inspiration: vercel-labs/just-bash

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    CONSUMER (AI Agent)                  │
│  session := sandbox.New(opts)                           │
│  result  := session.Run("pip install requests && ...")  │
└───────────────────────────┬─────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────┐
│                      Session Layer                      │
│  Persistent state: Env vars, CWD, shell functions,      │
│  command history, installed packages                    │
└──────┬────────────────────┬────────────────────┬────────┘
       │                    │                    │
┌──────▼──────┐   ┌─────────▼────────┐  ┌───────▼──────┐
│  Executor   │   │   Filesystem     │  │  Isolation   │
│             │   │                  │  │  Strategy    │
│ • NativeExec│   │ • MemoryFS       │  │              │
│   (os/exec) │   │ • LayeredFS      │  │ • Noop       │
│             │   │   (base+overlay) │  │ • Namespace  │
│ • ShellExec │   │ • ChangeTracker  │  │ • Landlock   │
│   (mvdan/sh)│   │ • Snapshot/Rst   │  │              │
└──────┬──────┘   └─────────┬────────┘  └───────┬──────┘
       │                    │                    │
┌──────▼────────────────────▼────────────────────▼──────┐
│                  Resource Controller                   │
│    CPU limits • Memory limits • Timeout • I/O caps     │
└────────────────────────────┬───────────────────────────┘
                             │
┌────────────────────────────▼───────────────────────────┐
│              Package Manager Abstraction               │
│       Pre-baked base layers • Lazy install • Cache     │
└────────────────────────────────────────────────────────┘

Repository Layout

agentic-bash/
├── go.mod
├── go.sum
├── plan.md
├── main.go                        # Example / CLI demo
│
├── sandbox/
│   ├── sandbox.go                 # Public API: New(), Run(), Reset(), Close()
│   ├── session.go                 # Session: persistent state across calls
│   ├── options.go                 # SandboxOptions, ResourceLimits, NetworkPolicy
│   ├── result.go                  # ExecutionResult: stdout, stderr, exit, fs diff
│   └── pool.go                    # SandboxPool: pre-warmed reusable sandboxes
│
├── fs/
│   ├── fs.go                      # SandboxFS interface
│   ├── memory.go                  # Pure Go in-memory FS (afero-backed)
│   ├── layered.go                 # Read-only base + writable overlay
│   ├── tracker.go                 # Tracks file creates/writes/deletes per run
│   └── snapshot.go                # Point-in-time snapshot + restore
│
├── executor/
│   ├── executor.go                # Executor interface
│   ├── native.go                  # os/exec based (real binaries on real FS)
│   └── shell.go                   # mvdan.cc/sh in-process pure Go shell
│
├── isolation/
│   ├── strategy.go                # IsolationStrategy interface
│   ├── noop.go                    # No-op (dev/macOS)
│   ├── namespace.go               # Linux namespaces (CLONE_NEWNS/PID/NET)
│   └── landlock.go                # Landlock LSM (Linux 5.13+, no root)
│
├── packages/
│   ├── manager.go                 # PackageManager interface
│   ├── base.go                    # Base layer: pre-baked tar of common tools
│   ├── apt.go                     # apt-get shimmed to overlay FS
│   ├── pip.go                     # pip shimmed to overlay FS
│   └── manifest.go                # Tracks what is installed in the sandbox
│
├── network/
│   ├── policy.go                  # NetworkPolicy: allow-all, deny-all, allowlist
│   └── namespace.go               # CLONE_NEWNET + veth setup (Linux)
│
└── internal/
    ├── cgroups/
    │   └── cgroups.go             # cgroupv2 CPU/memory enforcement (Linux)
    └── seccomp/
        └── filter.go              # syscall allowlist via go-seccomp-bpf (Linux)

Core Types

// sandbox/options.go

type IsolationLevel int
const (
    IsolationNone       IsolationLevel = iota // no-op, dev/macOS
    IsolationNamespace                        // Linux namespaces
    IsolationLandlock                         // Landlock LSM (no root)
    IsolationAuto                             // pick best available at runtime
)

type NetworkMode int
const (
    NetworkAllow     NetworkMode = iota // full host network access
    NetworkDeny                         // no external network (loopback only)
    NetworkAllowlist                    // egress to specific domains/CIDRs only
)

type NetworkPolicy struct {
    Mode      NetworkMode
    Allowlist []string // domains or CIDR ranges
    DNSServer string   // custom resolver for filtering
}

type ResourceLimits struct {
    Timeout       time.Duration // wall-clock timeout per Run()
    MaxMemoryMB   int           // memory.max in cgroupv2 (Linux only)
    MaxCPUPercent float64       // cpu.max quota (Linux only)
    MaxOutputMB   int           // stdout+stderr combined cap
    MaxFileSizeMB int           // largest single file write allowed
}

type Options struct {
    Isolation    IsolationLevel
    Limits       ResourceLimits
    Network      NetworkPolicy
    Env          map[string]string // initial environment
    WorkDir      string            // initial working directory inside sandbox
    BaseImageDir string            // path to pre-baked tool archive (optional)

    // Hooks
    OnCommand   func(cmd string)
    OnResult    func(r ExecutionResult)
    OnViolation func(v PolicyViolation)
}

// sandbox/result.go

type ExecutionResult struct {
    Stdout   string
    Stderr   string
    ExitCode int
    Duration time.Duration
    Error    error

    // Filesystem changes during this Run()
    FilesCreated  []string
    FilesModified []string
    FilesDeleted  []string

    // Resource usage (Linux only)
    CPUTime   time.Duration
    MemoryPeakMB int
}

type PolicyViolation struct {
    Type    string // "network", "filesystem", "syscall"
    Detail  string
    Blocked bool
}

// sandbox/session.go

type ShellState struct {
    Env          map[string]string
    Cwd          string
    Functions    map[string]string // shell function definitions
    History      []string          // command history
    Installed    []string          // package manifest
    ExportedVars map[string]bool   // which env vars are exported
}

Phase-by-Phase Implementation Plan

Phase 1 — Core Foundation

Goal: working sandbox.Run("echo hello") with timeout and result capture

Tasks:

Initialize go.mod with module path github.com/<org>/agentic-bash
Define all core types in sandbox/options.go and sandbox/result.go
Implement NativeExecutor in executor/native.go:
- Use os/exec.CommandContext with context.WithTimeout
- Capture stdout/stderr into separate bytes.Buffer
- Set SysProcAttr.Pdeathsig = syscall.SIGKILL to kill children when parent dies
- Set SysProcAttr.Setpgid = true and kill the entire process group on timeout
Implement Sandbox struct in sandbox/sandbox.go:
- New(opts Options) *Sandbox
- Run(cmd string) ExecutionResult
- Close() error
Implement Session in sandbox/session.go:
- Holds ShellState
- Run() merges session env into each command's environment
- Updates Cwd after each successful cd-equivalent
Write unit tests:
- Timeout fires and process is killed
- Exit codes are captured correctly
- Env vars set in one run are visible in the next

Key dependencies: stdlib only (os/exec, context, bytes, syscall)

Phase 2 — Pure Go In-Process Shell

Goal: execute shell scripts without requiring /bin/bash on the host

Tasks:

Add mvdan.cc/sh/v3 to go.mod
Implement ShellExecutor in executor/shell.go:
- Parse commands via syntax.NewParser().Parse()
- Execute via interp.NewRunner() configured with:
  - interp.Env(expand.ListEnviron(envSlice...)) from ShellState.Env
  - interp.Dir(state.Cwd)
  - interp.StdIO(stdin, stdout, stderr)
Wire custom interp.OpenHandler:
- Intercepts all file open/create calls
- Routes them through the sandbox SandboxFS (Phase 3)
- Returns fs.ErrPermission for paths outside sandbox root
Wire custom interp.ExecHandler:
- Intercepts all external command invocations
- Routes package manager commands (apt-get, pip) to shims (Phase 6)
- Falls through to real exec.LookPath for everything else
Sync ShellState back after each run:
- Capture updated env via runner.Vars
- Capture updated cwd via runner.Dir
- Capture defined functions via runner.Funcs
Write tests:
- Pipes: echo foo | tr a-z A-Z
- Redirections: echo hello > /tmp/out.txt && cat /tmp/out.txt
- Variables: X=1; X=$((X+1)); echo $X
- Loops: for i in 1 2 3; do echo $i; done
- set -e causes abort on first error
- Functions defined in one run are callable in the next

Key dependency: mvdan.cc/sh/v3

Phase 3 — Layered Filesystem

Goal: each sandbox has isolated, copy-on-write filesystem; host FS is untouched

Tasks:

Define SandboxFS interface in fs/fs.go:

type SandboxFS interface {
    Open(name string) (fs.File, error)
    Create(name string) (fs.File, error)
    Stat(name string) (fs.FileInfo, error)
    ReadDir(name string) ([]fs.DirEntry, error)
    MkdirAll(path string, perm fs.FileMode) error
    Remove(name string) error
    Rename(oldpath, newpath string) error
    WriteFile(name string, data []byte, perm fs.FileMode) error
    ReadFile(name string) ([]byte, error)
}

Implement MemoryFS in fs/memory.go:
- Backed by afero.MemMapFs
- Wraps all calls to enforce path containment within sandbox root
Implement LayeredFS in fs/layered.go:
- Lower layer: read-only afero.BasePathFs pointing at pre-baked tool dir
- Upper layer: writable MemoryFS
- Read: check upper first, fall through to lower on miss
- Write/Create/Remove: always go to upper layer only
- MkdirAll: replicated in upper even if lower already has dir
Implement ChangeTracker in fs/tracker.go:
- Wraps any SandboxFS
- Records FilesCreated, FilesModified, FilesDeleted per Run() interval
- Reset between runs; results merged into ExecutionResult
Implement Snapshot / Restore in fs/snapshot.go:
- Snapshot() ([]byte, error): serialize upper layer to a tar archive in memory
- Restore(data []byte) error: wipe upper layer and re-populate from tar
Wire ShellExecutor.OpenHandler to route all file I/O through LayeredFS
Write tests:
- Writes go to upper layer, lower layer unchanged
- Reads fall through to lower when upper has no entry
- State persists across Run() calls within a session
- Snapshot() then Restore() reproduces identical filesystem state
- File writes outside sandbox root are rejected with ErrPermission

Key dependency: github.com/spf13/afero

Phase 4 — Isolation Backends

Goal: pluggable OS-level isolation with graceful degradation

Tasks:

Define IsolationStrategy interface in isolation/strategy.go:

type IsolationStrategy interface {
    Name() string
    Available() bool             // runtime capability probe
    Wrap(cmd *exec.Cmd) error    // mutate cmd's SysProcAttr before exec
    Apply() error                // in-process restrictions (Landlock path)
}

Implement NoopStrategy in isolation/noop.go:
- Available() always returns true
- Wrap() and Apply() are no-ops
- Used on macOS and in tests
Implement NamespaceStrategy in isolation/namespace.go (Linux only, build tag linux):
- Wrap() sets:
```
cmd.SysProcAttr.Cloneflags = syscall.CLONE_NEWNS |
                              syscall.CLONE_NEWPID |
                              syscall.CLONE_NEWUSER
cmd.SysProcAttr.UidMappings = []syscall.SysProcIDMap{{...}}
cmd.SysProcAttr.GidMappings = []syscall.SysProcIDMap{{...}}
```
- CLONE_NEWNS: mount namespace — sandbox cannot see or affect host mounts
- CLONE_NEWPID: PID namespace — sandbox processes cannot signal host processes
- CLONE_NEWUSER: user namespace — maps sandbox root to unprivileged host UID (no sudo needed)
- Available() probes via unix.Unshare(syscall.CLONE_NEWUSER) in a test goroutine
Implement LandlockStrategy in isolation/landlock.go (Linux 5.13+, build tag linux):
- Apply() is called inside the child process before exec (via runtime/debug + syscall.ForkExec)
- Uses go-landlock to restrict allowed read/write paths to sandbox tmpdir only
- Available() probes kernel version for Landlock ABI level >= 1
Implement auto-selector in isolation/strategy.go:
- BestAvailable() IsolationStrategy probes in order: Landlock → Namespace → Noop
- Called at sandbox.New() when IsolationAuto is specified
Write tests:
- NamespaceStrategy child cannot write to host paths
- LandlockStrategy rejects access to paths outside sandbox root
- Auto-selector falls back to Noop on macOS without panicking

Key dependency: github.com/shoenig/go-landlock

Phase 5 — Resource Control

Goal: enforce CPU, memory, and I/O limits; prevent sandbox from DoS-ing the host

Tasks:

Process group kill on timeout (all platforms):
- Already seeded in Phase 1 via SysProcAttr.Setpgid = true
- On context cancellation: syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
- Ensures all children in the process group are killed, not just the shell
Output size cap (all platforms):
- Wrap cmd.Stdout and cmd.Stderr with io.LimitedReader(w, maxBytes)
- When limit reached, kill the process and set ExecutionResult.Error
cgroupv2 memory limit (Linux, build tag linux) in internal/cgroups/cgroups.go:
- At sandbox New(): create /sys/fs/cgroup/agentic-bash/<uuid>/
- Write MaxMemoryMB * 1024 * 1024 to memory.max
- After cmd.Start(): write PID to cgroup.procs
- At sandbox Close(): rmdir the cgroup
- Available() checks for /sys/fs/cgroup/cgroup.controllers
cgroupv2 CPU quota (Linux):
- Write "<quota> <period>" to cpu.max (e.g., "50000 100000" for 50% of one CPU)
cgroupv2 I/O cap (Linux):
- Write "<major>:<minor> rbps=<n> wbps=<n>" to io.max
Seccomp syscall allowlist (Linux, optional hardening) in internal/seccomp/filter.go:
- Use github.com/elastic/go-seccomp-bpf
- Allow list: read, write, open, openat, close, stat, fstat, lstat, mmap, mprotect, munmap, brk, rt_sigaction, rt_sigprocmask, ioctl, pread64, pwrite64, readv, writev, access, pipe, select, sched_yield, mremap, msync, mincore, madvise, shmget, shmat, shmctl, dup, dup2, pause, nanosleep, getitimer, alarm, setitimer, getpid, sendfile, socket, connect, accept, sendto, recvfrom, sendmsg, recvmsg, shutdown, bind, listen, getsockname, getpeername, socketpair, setsockopt, getsockopt, clone, fork, vfork, execve, exit, wait4, kill, uname, fcntl, flock, fsync, fdatasync, truncate, ftruncate, getdents, getcwd, chdir, fchdir, rename, mkdir, rmdir, creat, link, unlink, symlink, readlink, chmod, fchmod, chown, fchown, lchown, umask, gettimeofday, getrlimit, getrusage, sysinfo, times, ptrace [DENY], getuid, syslog [DENY], getgid, setuid [DENY], setgid [DENY], geteuid, getegid, setpgid, getppid, getpgrp, setsid, setreuid [DENY], setregid [DENY], getgroups, setgroups [DENY], setresuid [DENY], setresgid [DENY], getresuid, getresgid, getpgid, setfsuid [DENY], setfsgid [DENY], getsid, capget, capset [DENY], rt_sigpending, rt_sigtimedwait, rt_sigqueueinfo, rt_sigsuspend, sigaltstack, utime, mknod, uselib [DENY], personality [DENY], ustat [DENY], statfs, fstatfs, sysfs [DENY], getpriority, setpriority, sched_setparam, sched_getparam, sched_setscheduler, sched_getscheduler, sched_get_priority_max, sched_get_priority_min, sched_rr_get_interval, mlock, munlock, mlockall, munlockall, vhangup [DENY], modify_ldt [DENY], pivot_root [DENY], _sysctl [DENY], prctl, arch_prctl, adjtimex [DENY], setrlimit, chroot [DENY], sync, acct [DENY], settimeofday [DENY], mount [DENY], umount2 [DENY], swapon [DENY], swapoff [DENY], reboot [DENY], sethostname [DENY], setdomainname [DENY], iopl [DENY], ioperm [DENY], create_module [DENY], init_module [DENY], delete_module [DENY], get_kernel_syms [DENY], query_module [DENY], quotactl [DENY], nfsservctl [DENY], getpmsg [DENY], putpmsg [DENY], afs_syscall [DENY], tuxcall [DENY], security [DENY], gettid, readahead, setxattr [DENY], lsetxattr [DENY], fsetxattr [DENY], getxattr, lgetxattr, fgetxattr, listxattr, llistxattr, flistxattr, removexattr [DENY], lremovexattr [DENY], fremovexattr [DENY], tkill, time, futex, sched_setaffinity, sched_getaffinity, set_thread_area, io_setup [DENY], io_destroy [DENY], io_getevents [DENY], io_submit [DENY], io_cancel [DENY], get_thread_area, lookup_dcookie [DENY], epoll_create, epoll_ctl_old [DENY], epoll_wait_old [DENY], remap_file_pages [DENY], getdents64, set_tid_address, restart_syscall, semtimedop, fadvise64, timer_create, timer_settime, timer_gettime, timer_getoverrun, timer_delete, clock_settime [DENY], clock_gettime, clock_getres, clock_nanosleep, exit_group, epoll_wait, epoll_ctl, tgkill, utimes, vserver [DENY], mbind [DENY], set_mempolicy [DENY], get_mempolicy [DENY], mq_open, mq_unlink, mq_timedsend, mq_timedreceive, mq_notify, mq_getsetattr, kexec_load [DENY], waitid, add_key [DENY], request_key [DENY], keyctl [DENY], ioprio_set, ioprio_get, inotify_init, inotify_add_watch, inotify_rm_watch, migrate_pages [DENY], openat, mkdirat, mknodat, fchownat, futimesat, newfstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat, pselect6, ppoll, unshare [DENY], set_robust_list, get_robust_list, splice, tee, sync_file_range, vmsplice, move_pages [DENY], utimensat, epoll_pwait, signalfd, timerfd_create, eventfd, fallocate, timerfd_settime, timerfd_gettime, accept4, signalfd4, eventfd2, epoll_create1, dup3, pipe2, inotify_init1, preadv, pwritev, rt_tgsigqueueinfo, perf_event_open [DENY], recvmmsg, fanotify_init [DENY], fanotify_mark [DENY], prlimit64, name_to_handle_at [DENY], open_by_handle_at [DENY], clock_adjtime [DENY], syncfs, sendmmsg, setns [DENY], getcpu, process_vm_readv [DENY], process_vm_writev [DENY], kcmp [DENY], finit_module [DENY], sched_setattr, sched_getattr, renameat2, seccomp [DENY], getrandom, memfd_create, kexec_file_load [DENY], bpf [DENY], execveat, userfaultfd [DENY], membarrier, mlock2, copy_file_range, preadv2, pwritev2, pkey_mprotect [DENY], pkey_alloc [DENY], pkey_free [DENY], statx, io_pgetevents, rseq, pidfd_send_signal [DENY], io_uring_setup [DENY], io_uring_enter [DENY], io_uring_register [DENY], open_tree [DENY], move_mount [DENY], fsopen [DENY], fsconfig [DENY], fsmount [DENY], fspick [DENY], pidfd_open, clone3, close_range, openat2, pidfd_getfd [DENY], faccessat2, process_madvise [DENY], epoll_pwait2, mount_setattr [DENY], quotactl_fd [DENY], landlock_create_ruleset [DENY], landlock_add_rule [DENY], landlock_restrict_self [DENY]
- Applied to child process before exec
Write tests:
- Process killed when memory cgroup limit is hit
- CPU quota causes measurable throttling
- Output truncated and process killed when output cap exceeded
- Seccomp blocks mount syscall, allows read/write

Key dependency: github.com/elastic/go-seccomp-bpf

Phase 6 — Package Manager Shims

Goal: apt-get install, pip install work transparently; changes land in sandbox overlay

Tasks:

Base layer pre-population in packages/base.go:
- Embed a minimal tar archive via //go:embed base.tar.gz
- Archive contains: bash, curl, wget, git, python3, pip3, gcc, make, jq, common coreutils
- On New(): extract into a tmpdir as the LayeredFS lower layer
- Cache extracted base dir globally (keyed by archive checksum); reuse across sandboxes

Define PackageManager interface in packages/manager.go:

type PackageManager interface {
    Install(ctx context.Context, pkg string) error
    Uninstall(ctx context.Context, pkg string) error
    IsInstalled(pkg string) bool
    Installed() []PackageInfo
}

type PackageInfo struct {
    Name    string
    Version string
    Manager string // "apt", "pip", "npm", etc.
}

Implement AptShim in packages/apt.go:
- Intercepted via ShellExecutor.ExecHandler when command is apt-get or apt
- Downloads .deb packages from Debian mirrors into a per-sandbox cache dir
- Extracts .deb with ar + tar into <overlay_root>/usr/
- Updates manifest
- Honors DEBIAN_FRONTEND=noninteractive; suppresses interactive prompts
Implement PipShim in packages/pip.go:
- Intercepted when command is pip, pip3, or python -m pip
- Translates pip install <pkg> to: pip install --target=<overlay_root>/lib/python3/site-packages <pkg>
- Updates manifest; records version via importlib.metadata
Implement shared download cache in packages/base.go:
- Cache dir: ~/.cache/agentic-bash/packages/
- Keyed by <manager>/<package>@<version> hash
- Shared across all sandboxes on the same host; concurrent access via file lock
Implement PackageManifest in packages/manifest.go:
- Serializable []PackageInfo list
- Persisted in sandbox ShellState
- Included in Snapshot output (Phase 3)
Write tests:
- pip install requests → import requests works in next Run()
- Installed package is visible in sandbox FS overlay, not in host FS
- Reinstall is a no-op (cached); cache hit measured by timing
- Manifest correctly reflects installed packages after snapshot/restore

Phase 7 — Network Control

Goal: control outbound network access per sandbox; deny by default for untrusted agents

Tasks:

Define NetworkPolicy in network/policy.go (see Core Types above)
Implement NetworkMode_Allow (default):
- No special configuration; child inherits host network stack
Implement NetworkMode_Deny (Linux) in network/namespace.go:
- Add syscall.CLONE_NEWNET to NamespaceStrategy.Cloneflags
- Child gets isolated network namespace with only lo (loopback) interface
- DNS lookups fail; TCP connections to external IPs fail
- Available() requires NamespaceStrategy.Available()
Implement NetworkMode_Allowlist (Linux):
- Create net namespace with veth pair connecting to host bridge
- Use github.com/vishvananda/netlink to:
  - Create veth pair (veth0 in sandbox ns, veth1 in host ns)
  - Add iptables OUTPUT rules in sandbox namespace: ACCEPT for allowlisted CIDRs/ports, DROP all else
  - Configure NAT on host side for sandbox traffic
- DNS: route port 53 through filtering resolver that checks domain allowlist
macOS fallback: NetworkMode_Deny and NetworkMode_Allowlist log a warning and degrade to NetworkMode_Allow (network namespaces not available on macOS)
Write tests:
- Deny mode: curl https://example.com exits non-zero
- Deny mode: curl http://localhost:8080 succeeds (loopback allowed)
- Allowlist mode: request to allowlisted domain succeeds, non-listed domain fails
- Allow mode: full outbound access works

Key dependency: github.com/vishvananda/netlink

Phase 8 — SandboxPool & Production API

Goal: production-ready API for AI agents — fast startup, concurrent use, observability

Tasks:

Implement SandboxPool in sandbox/pool.go:

type Pool struct {
    opts    Options
    pool    chan *Sandbox
    minSize int
    maxSize int
    idleTTL time.Duration
}

func NewPool(opts Options, minSize, maxSize int) *Pool
func (p *Pool) Acquire(ctx context.Context) (*Sandbox, error)
func (p *Pool) Release(s *Sandbox)
func (p *Pool) Close() error

Background goroutine pre-warms minSize sandboxes at startup (base layer unpacked, ready to use)
Acquire() returns a warm sandbox instantly if available; creates new one if pool empty (up to maxSize)
Release() calls s.Reset() then returns sandbox to channel; discards if pool full
Idle sandboxes older than idleTTL are drained and closed

Implement streaming Run variant:
```
func (s *Sandbox) RunStream(ctx context.Context, cmd string, stdout, stderr io.Writer) (int, error)
```
- Writes stdout/stderr in real time as process produces output
- Returns exit code when process completes

Implement file transfer API in sandbox/sandbox.go:

func (s *Sandbox) WriteFile(path string, data []byte) error
func (s *Sandbox) ReadFile(path string) ([]byte, error)
func (s *Sandbox) ListFiles(dir string) ([]FileInfo, error)
func (s *Sandbox) UploadTar(r io.Reader) error    // batch file injection
func (s *Sandbox) DownloadTar(w io.Writer) error  // batch file extraction

Implement event hooks in sandbox/sandbox.go:
- OnCommand(cmd string): called before each Run() starts
- OnResult(r ExecutionResult): called after each Run() completes
- OnViolation(v PolicyViolation): called when a policy rule is triggered (may or may not block)
Implement Reset() in sandbox/sandbox.go:
- Wipe upper FS layer (replace with fresh MemoryFS)
- Reset ShellState to initial values from Options.Env and Options.WorkDir
- Keep base layer in place (no re-unpack needed)
- Fast: O(1) allocation, not proportional to base layer size
OpenTelemetry integration (optional build tag otel):
- Wrap Run() with tracer.Start(ctx, "sandbox.Run")
- Add span attributes: exit_code, duration_ms, command_hash, memory_peak_mb, cpu_time_ms
- Export metrics: sandbox.run.duration, sandbox.run.count, sandbox.pool.size, sandbox.pool.wait_time
Write tests:
- Pool pre-warms to minSize before first Acquire()
- Concurrent Acquire() from 50 goroutines all succeed within timeout
- Release() + Acquire() reuses sandbox with clean state
- Idle sandboxes are discarded after idleTTL
- RunStream() delivers output incrementally (verified with chunked producer)

Phase 9 — CLI Demo & Integration Tests

Goal: standalone binary demonstrating all capabilities; doubles as integration test harness

Tasks:

Implement CLI in main.go using github.com/spf13/cobra:

agentic-bash shell                   # interactive REPL session
agentic-bash run <script.sh>         # run a script file in sandbox
agentic-bash run --cmd "echo hello"  # run inline command
agentic-bash snapshot --out <file>   # snapshot sandbox state to file
agentic-bash restore --in <file>     # restore from snapshot and attach shell

Flags for run subcommand:
- --timeout 30s
- --memory 256m
- --cpu 50 (percent)
- --network deny|allow|allowlist
- --allowlist "github.com,pypi.org"
- --isolation auto|namespace|landlock|none
- --env KEY=VALUE (repeatable)
- --workdir /workspace
- --output-cap 10m
Interactive REPL (shell subcommand):
- Uses github.com/chzyer/readline for line editing + history
- Persistent Session across REPL entries
- Shows exit code and duration after each command
- %reset meta-command wipes sandbox state
- %snapshot <file> and %restore <file> meta-commands
Integration tests (in integration/ directory, run with go test -tags integration):
- Full pipeline: install package → use it → snapshot → restore → verify state
- Concurrent sessions: 20 parallel sandboxes each installing different packages
- Network deny: verify no outbound connections possible
- Resource limits: verify OOM kill and timeout kill work end-to-end
- macOS: verify graceful degradation (no panic when namespace isolation unavailable)

Key dependencies: github.com/spf13/cobra, github.com/chzyer/readline

Dependency List

Package	Purpose	Required
`mvdan.cc/sh/v3`	Pure Go bash interpreter (no `/bin/bash` required)	Yes
`github.com/spf13/afero`	Filesystem abstraction + in-memory FS	Yes
`github.com/spf13/cobra`	CLI framework	Yes (CLI only)
`github.com/chzyer/readline`	REPL line editing + history	Yes (CLI only)
`github.com/shoenig/go-landlock`	Unprivileged path-based isolation (Linux 5.13+)	Optional
`github.com/elastic/go-seccomp-bpf`	Syscall allowlist filtering (Linux)	Optional
`github.com/vishvananda/netlink`	Network namespace + iptables management (Linux)	Optional
`go.opentelemetry.io/otel`	Distributed tracing + metrics	Optional

Zero CGO required for the core path. All optional Linux features compile cleanly on macOS via build tags and degrade to no-ops at runtime.

Platform Support Matrix

Feature	Linux	macOS	Notes
Shell execution	✅	✅	Pure Go shell via `mvdan.cc/sh` — no host bash required
In-memory filesystem isolation	✅	✅	Afero `MemMapFs` — fully cross-platform
Layered FS (base + overlay)	✅	✅	Pure Go implementation
Timeout + output caps	✅	✅	`context.WithTimeout` + `io.LimitedReader`
Process group kill	✅	✅	`Setpgid` + kill signal to `-pgid`
File transfer API	✅	✅	Operates on virtual FS layer
Package install shims (pip/apt)	✅	✅	Overlay FS receives install artifacts
Snapshot / Restore	✅	✅	tar-based serialization of upper FS layer
SandboxPool	✅	✅	Goroutine-based, cross-platform
PID namespace isolation	✅	❌	`CLONE_NEWPID` — Linux only
Mount namespace isolation	✅	❌	`CLONE_NEWNS` — Linux only
User namespace (no root)	✅	❌	`CLONE_NEWUSER` — Linux only
Memory limit enforcement	✅	❌	cgroupv2 `memory.max` — Linux only
CPU quota enforcement	✅	❌	cgroupv2 `cpu.max` — Linux only
I/O bandwidth cap	✅	❌	cgroupv2 `io.max` — Linux only
Landlock path restrictions	✅	❌	Linux 5.13+ only, no root needed
Seccomp syscall filter	✅	❌	Linux only
Network namespace (deny)	✅	❌	`CLONE_NEWNET` — Linux only
Network allowlist	✅	❌	netlink + iptables — Linux only

macOS provides filesystem + shell isolation with process-level (not OS-level) guarantees. Sufficient for trusted agent code running on developer machines.

Linux provides the full hardened stack: namespaces + cgroups + landlock + seccomp + network isolation. Suitable for production multi-agent deployments.

Key Design Decisions & Rationale

1. Pure Go shell (`mvdan.cc/sh`) over `os/exec("bash")`

Agents don't need bash installed on the host. The shell is deterministic, hookable at the Go level, and its OpenHandler/ExecHandler interfaces are what enable transparent filesystem virtualization and package manager interception. Without this, FS virtualization would require chroot (needs root) or a user-mode filesystem (FUSE) — both operationally heavy.

2. Layered FS with Afero overlay, not chroot or FUSE

chroot requires root. FUSE has high per-syscall overhead and needs kernel modules. Layered Afero in pure Go is zero-privilege, zero-overhead, and the ShellExecutor.OpenHandler makes it transparent to shell scripts. Trade-off: only shell-initiated file I/O goes through the virtual FS; native binaries invoked via NativeExecutor see the real host filesystem (mitigated by NamespaceStrategy on Linux).

3. Strategy pattern for isolation

Different callers have different threat models and operating environments. Forcing root or VM on dev machines kills adoption. Strategy pattern lets the library work out of the box on macOS (Noop), get Linux namespace isolation in CI, and get full hardening in production — same API, zero code changes.

4. Persistent session state across `Run()` calls

This is the defining characteristic borrowed from just-bash. AI agents issue sequences of related commands: install a tool, configure it, run it, inspect output. Each Run() must see env vars and cwd changes from the previous one. Most container-based approaches lose this state (new subprocess per call). The ShellState struct solves this explicitly.

5. SandboxPool with pre-warming

AI agents often run many parallel, short-lived tasks. Base layer extraction (decompressing the tool archive) takes 100-500ms. The pool does this work once at startup and amortizes it across all agent invocations. Release() + Reset() is O(1) (fresh MemMapFs allocation), not O(base layer size).

6. No daemon, no Docker, no root

The entire library is a Go import. No sidecar process, no Unix socket, no container runtime, no privilege escalation. Operators just compile and run. This is the key differentiation from E2B, Modal, and similar managed services.

What's Intentionally Out of Scope

Feature	Reason
Firecracker / microVM support	Needed only for untrusted third-party code; adds 125ms+ cold start and requires KVM (`CAP_SYS_ADMIN`). Out of scope for an embedded library; a separate `firecracker` backend could be added as a plugin later.
Windows support	Linux namespace APIs don't exist on Windows. Would require Hyper-V isolation strategy — a significant separate effort.
Full OCI / Docker compatibility	We are not building a container runtime. We're building a sandboxed execution environment that is lighter and embeddable.
Remote execution / agent-as-a-service	The library runs in-process. Adding a network RPC layer (gRPC/HTTP) is a thin wrapper that callers can build; it's not part of the core library.
GUI / browser-based terminal	Out of scope; a WebSocket bridge to `RunStream()` is trivial for callers to implement.

Implementation Order & Dependencies Between Phases

Phase 1 (Core + NativeExecutor)
    └── Phase 2 (ShellExecutor)        ← needs Phase 1 types
        └── Phase 3 (Layered FS)       ← wired into ShellExecutor hooks
            ├── Phase 4 (Isolation)    ← wraps NativeExecutor cmd
            ├── Phase 5 (Resources)    ← wraps both executors
            └── Phase 6 (Packages)     ← wired into ShellExecutor ExecHandler + Phase 3 FS
                └── Phase 7 (Network)  ← integrated into Phase 4 isolation strategy
                    └── Phase 8 (Pool + API)  ← orchestrates all above
                        └── Phase 9 (CLI + Integration Tests)

Phases 4, 5, 6, and 7 can be developed in parallel once Phase 3 is complete.

Testing Strategy

Test Type	Location	What's Covered
Unit tests	`*_test.go` alongside each package	Individual functions, edge cases, error paths
Integration tests	`integration/` with `-tags integration`	Full sandbox lifecycle, package install, network policy
Platform tests	CI matrix: `ubuntu-latest`, `macos-latest`	Graceful degradation on macOS; full isolation on Linux
Fuzz tests	`fs/`, `executor/`	Malformed commands, path traversal attempts, unicode
Benchmark tests	`sandbox/`, `pool/`	`Run()` latency, pool acquire time, concurrent throughput

Success Criteria

sandbox.New(opts).Run("echo hello") works on macOS and Linux with zero configuration
A Python script that does pip install requests; python3 -c "import requests; print(requests.get('https://httpbin.org/get').status_code)" executes end-to-end inside the sandbox
Sandbox filesystem writes are not visible on the host filesystem
A Run() that exceeds ResourceLimits.Timeout is killed within 100ms of the deadline
On Linux, a process that exceeds MaxMemoryMB is killed by the OOM handler before affecting the host
SandboxPool.Acquire() returns a warm sandbox in < 5ms (after pool is pre-warmed)
The library compiles and all unit tests pass on macOS without any build errors or panics
The library has no CGO dependencies in the core path

Gap Analysis vs vercel-labs/just-bash

Evaluated against vercel-labs/just-bash (TypeScript virtual bash environment for AI agents) on 2026-03-20.

1. Features in just-bash not planned in agentic-bash

Feature	just-bash	agentic-bash	Notes
Custom commands API	`defineCommand()`, `customCommands`, `registerCommand()` — inject arbitrary built-ins	No plugin/extension API	Useful for AI agents that need tool-specific commands
Restrict available built-ins	`commands` option to allowlist which built-ins are callable	No equivalent	Security hardening for untrusted scripts
AST transform plugin API	`registerTransformPlugin()`, `transform()` — expose parsed AST	No AST exposure	Useful for analysis, fuzzing, and instrumentation
Browser / WASM support	Runs in-browser via `browser.ts` entrypoint	Linux/macOS native only	Not a stated goal; out of scope for Go
AI SDK tool integration	`bash-tool` wrapper for direct AI SDK use	No first-class AI tool adapter	Low-effort wrapper to add in Phase 9
Portable sandbox API	Vercel Sandbox-compatible interface for swapping backends	No portability layer	Allows upgrading to VM isolation transparently
Virtual process info	`processInfo` option spoofs `$$`, `$UID`, `$HOSTNAME`	Real host PIDs/UIDs visible	Information disclosure risk for untrusted scripts
Per-exec `stdin`	`stdin` per `exec()` call	Not exposed per-call	Minor gap; easy to add to `RunOptions`
Binary data handling	Handles images/compressed files via latin1 encoding	Binary stdout behavior undocumented	Relevant for `cat` on binary files
Threat model document	Comprehensive `THREAT_MODEL.md` with 65+ attack vectors and mitigations	None	Should be written after Phase 9

2. Planned in agentic-bash but not yet implemented

Phase	Feature	File	Status
Phase 8	`SandboxPool`	`sandbox/pool.go`	Missing
Phase 8	`RunStream()` streaming output	`sandbox/sandbox.go`	Missing
Phase 8	`UploadTar` / `DownloadTar` batch file API	`sandbox/sandbox.go`	Missing
Phase 8	`Reset()`	`sandbox/sandbox.go`	Missing
Phase 8	OpenTelemetry integration	`sandbox/sandbox.go`	Missing
Phase 9	Cobra CLI (`run`, `shell`, `snapshot`, `restore`)	`main.go`	Missing (`main.go` is TUI-only)
Phase 9	Integration test suite	`integration/`	Missing
Phase 5	Seccomp BPF syscall filter	`internal/seccomp/filter.go`	Missing

3. Fine-grained in-process execution limits

just-bash enforces configurable limits inside the interpreter, per exec() call:

Max recursion / call depth
Max total command count per execution
Max loop iterations
Max heredoc size

agentic-bash relies on wall-clock timeout + cgroupv2 memory/CPU (Linux-only). There are no per-call command/loop iteration counts enforced at the interpreter level. This means a tight infinite loop on macOS (where cgroups are unavailable) will spin indefinitely until the timeout fires.

Recommendation: Add an ExecutionLimits sub-struct to ResourceLimits and wire it into the mvdan.cc/sh runner via a custom interp.ExecHandler counter.

4. Architectural note on session persistence

The plan states that persistent shell state across Run() calls is "borrowed from just-bash." This is incorrect — just-bash actually resets env vars, cwd, and functions on each exec() call (only the filesystem persists across calls). agentic-bash's persistent session model is its own design decision and is a genuine differentiator. The rationale in the Key Design Decisions section should be updated to reflect this accurately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agentic-bash: Embedded Go Sandbox Library for AI Agents

What We're Building

Architecture Overview

Repository Layout

Core Types

Phase-by-Phase Implementation Plan

Phase 1 — Core Foundation

Phase 2 — Pure Go In-Process Shell

Phase 3 — Layered Filesystem

Phase 4 — Isolation Backends

Phase 5 — Resource Control

Phase 6 — Package Manager Shims

Phase 7 — Network Control

Phase 8 — SandboxPool & Production API

Phase 9 — CLI Demo & Integration Tests

Dependency List

Platform Support Matrix

Key Design Decisions & Rationale

1. Pure Go shell (`mvdan.cc/sh`) over `os/exec("bash")`

2. Layered FS with Afero overlay, not chroot or FUSE

3. Strategy pattern for isolation

4. Persistent session state across `Run()` calls

5. SandboxPool with pre-warming

6. No daemon, no Docker, no root

What's Intentionally Out of Scope

Implementation Order & Dependencies Between Phases

Testing Strategy

Success Criteria

Gap Analysis vs vercel-labs/just-bash

1. Features in just-bash not planned in agentic-bash

2. Planned in agentic-bash but not yet implemented

3. Fine-grained in-process execution limits

4. Architectural note on session persistence

FilesExpand file tree

plan.md

Latest commit

History

plan.md

File metadata and controls

agentic-bash: Embedded Go Sandbox Library for AI Agents

What We're Building

Architecture Overview

Repository Layout

Core Types

Phase-by-Phase Implementation Plan

Phase 1 — Core Foundation

Phase 2 — Pure Go In-Process Shell

Phase 3 — Layered Filesystem

Phase 4 — Isolation Backends

Phase 5 — Resource Control

Phase 6 — Package Manager Shims

Phase 7 — Network Control

Phase 8 — SandboxPool & Production API

Phase 9 — CLI Demo & Integration Tests

Dependency List

Platform Support Matrix

Key Design Decisions & Rationale

1. Pure Go shell (mvdan.cc/sh) over os/exec("bash")

2. Layered FS with Afero overlay, not chroot or FUSE

3. Strategy pattern for isolation

4. Persistent session state across Run() calls

5. SandboxPool with pre-warming

6. No daemon, no Docker, no root

What's Intentionally Out of Scope

Implementation Order & Dependencies Between Phases

Testing Strategy

Success Criteria

Gap Analysis vs vercel-labs/just-bash

1. Features in just-bash not planned in agentic-bash

2. Planned in agentic-bash but not yet implemented

3. Fine-grained in-process execution limits

4. Architectural note on session persistence

1. Pure Go shell (`mvdan.cc/sh`) over `os/exec("bash")`

4. Persistent session state across `Run()` calls