🧠 Agent Sandbox Runtime ⚡

The "Self-Correcting" Architecture That Actually Works

"It's like giving your AI agent a private gym where it trains until it beats the task." 🏋️‍♂️✨

🚀 Quick Start • 📖 Documentation • ⚔️ Battle Benchmarks

🎐 What Is This?

Most AI agents are like eager interns: they write code, hand it to you, and pray it works. When it breaks, you have to fix it. (ノಠ益ಠ)ノ彡┻━┻

Agent Sandbox Runtime is different. It's a secure, self-correcting runtime that treats code generation like a loop, not a one-off:

Generate code (using extensive Swarm Intelligence 🐝)
Execute inside a locked-down Docker container 🔒
Explode? 💥 Catch the error, analyze the stack trace.
Fix it. 🛠️ Rewrite the code.
Repeat until it works or hits the retry limit.

The result? Code that actually runs. (⌐■_■)

🌊 Flow & Architecture

We call it the Reflexion Loop. It's the secret sauce that bumps success rates from ~60% to 92%.

graph LR
    A[User Task] --> B(Generate)
    B --> C{Sandbox Execution}
    C -->|✅ Success| D[Return Result]
    C -->|❌ Failure| E[Critique & Fix]
    E --> B
    style C fill:#ff9,stroke:#333,stroke-width:2px
    style E fill:#f9f,stroke:#333,stroke-width:2px

🧠 Swarm Intelligence `[Activated]`

It's not just one LLM. It's a council of specialized agents working in a peer-to-peer structure:

🎩 The Architect - Plans the structure.
💻 The Coder - Writes the raw Python.
🧐 The Critic - Hunts for logic bugs.
🛡️ The Security - Ensures no shenanigans (rm -rf /).

🔩 System Core & Capabilities

Under the hood, this isn't just a wrapper. It's a full-blown runtime environment.

🛡️ The Safety Contract (Sandboxing)

Every line of code runs inside an ephemeral Docker container.

No Network Access: Code cannot call home or download malware. [OFFLINE]
Resource Limits: Capped at 512MB RAM / 0.5 CPU. No fork bombs. [CAPPED]
Timeouts: Hard cut-off at 5 seconds. No infinite loops. [STRICT]
Ephemeral: Container dies immediately after execution. No persistence. [CLEAN]

🔌 Provider Agnostic Layer

Switch intelligence providers instantly via .env. logic remains the same.

GROQ (Llama 3 70B) - Recommended for speed (750ms)
OPENAI (GPT-4o) - Best for complex logic
ANTHROPIC (Claude 3.5 Sonnet) - Best for code quality
OLLAMA (DeepSeek Coder / Qwen) - 100% Local & Private

💾 Memory & State (LangGraph)

Uses graph-based state management to persist the conversation context and learning history during the reflection loop.

Checkpointing: Resumes from last failed state.
Reflection History: Remembers why previous 2 attempts failed.
Structured Output: Enforced JSON schema for all internal communication.

🎨 Visual Showcase

The Awakening (Swarm Init) 🌌	Code Alchemy (Generation) ⚗️

The Solution 📜	Victory (Result) 🏆

🎬 Witness the Magic

See the agent build a full snake game from scratch in under 30 seconds.

⚔️ Benchmarks & Performance

We put this runtime up against the giants. Here is the tale of the tape:

Contender	Success Rate	Speed	Self-Healing?	Wallet Damage
Agent Sandbox 🦁	92%	~743ms ⚡	YES	Free
GPT-4 Code Interpreter	87%	~3.2s	Yes	$$$
Devin	85%	~45s	Yes	$$$$$
Standard LLM API	~40-60%	Variable	NO `(T_T)`	$$

Validated on 12 complex algorithmic challenges ranging from Fibonacci sequences to custom data structure implementations.

🚀 Quick Start

Get up and running faster than you can say "Segmentational Fault".

Option 1: The "I have Docker" Way (Recommended) 🐳

docker run -e GROQ_API_KEY=your_key ghcr.io/ixchio/agent-sandbox-runtime

Option 2: The "Hacker" Way (Local) 💻

# 1. Clone the Scroll
git clone https://github.com/ixchio/agent-sandbox-runtime.git
cd agent-sandbox-runtime

# 2. Summon Dependencies
pip install -e .

# 3. Configure Your Mana (API Keys)
cp .env.example .env
# (Add your key: GROQ_API_KEY, OPENAI_API_KEY, etc.)

# 4. Cast Spell
agent-sandbox run "Calculate the first 10 prime numbers"

⚙️ Power Ups (Configuration)

Adjust your runtime environment via .env or environment variables.

Variable	Description	Default
`LLM_PROVIDER`	Choose your champion: `groq`, `openai`, `anthropic`, `ollama`	`groq`
`MAX_REFLEXION_ATTEMPTS`	How many times to try fixing bugs before giving up?	`3`
`SANDBOX_TIMEOUT_SECONDS`	Max execution time (prevent infinite loops)	`5.0`

🤝 Join the Guild (Contributing)

We are building the future of agentic coding. Want to help? Check out CONTRIBUTING.md for the rules of engagement.

We love PRs! (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

Built with 💜 by the Open Source Community

Report Bug 🐛 • Request Feature 💡

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
docker/sandbox		docker/sandbox
docs		docs
scripts		scripts
src/agent_sandbox		src/agent_sandbox
tests		tests
.env.example		.env.example
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
fly.toml		fly.toml
monster_demo.html		monster_demo.html
pyproject.toml		pyproject.toml
railway.json		railway.json
render.yaml		render.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Agent Sandbox Runtime ⚡

The "Self-Correcting" Architecture That Actually Works

🎐 What Is This?

🌊 Flow & Architecture

🧠 Swarm Intelligence `[Activated]`

🔩 System Core & Capabilities

🛡️ The Safety Contract (Sandboxing)

🔌 Provider Agnostic Layer

💾 Memory & State (LangGraph)

🎨 Visual Showcase

🎬 Witness the Magic

⚔️ Benchmarks & Performance

🚀 Quick Start

Option 1: The "I have Docker" Way (Recommended) 🐳

Option 2: The "Hacker" Way (Local) 💻

⚙️ Power Ups (Configuration)

🤝 Join the Guild (Contributing)

About

Uh oh!

Releases 1

Packages

Languages

License

ixchio/agent-sandbox-runtime

Folders and files

Latest commit

History

Repository files navigation

🧠 Agent Sandbox Runtime ⚡

The "Self-Correcting" Architecture That Actually Works

🎐 What Is This?

🌊 Flow & Architecture

🧠 Swarm Intelligence [Activated]

🔩 System Core & Capabilities

🛡️ The Safety Contract (Sandboxing)

🔌 Provider Agnostic Layer

💾 Memory & State (LangGraph)

🎨 Visual Showcase

🎬 Witness the Magic

⚔️ Benchmarks & Performance

🚀 Quick Start

Option 1: The "I have Docker" Way (Recommended) 🐳

Option 2: The "Hacker" Way (Local) 💻

⚙️ Power Ups (Configuration)

🤝 Join the Guild (Contributing)

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

🧠 Swarm Intelligence `[Activated]`

Packages