"It's like giving your AI agent a private gym where it trains until it beats the task." ๐๏ธโโ๏ธโจ
๐ Quick Start ย โขย ๐ Documentation ย โขย โ๏ธ Battle Benchmarks
Most AI agents are like eager interns: they write code, hand it to you, and pray it works. When it breaks, you have to fix it. (ใเฒ ็เฒ )ใๅฝกโปโโป
Agent Sandbox Runtime is different. It's a secure, self-correcting runtime that treats code generation like a loop, not a one-off:
- Generate code (using extensive Swarm Intelligence ๐)
- Execute inside a locked-down Docker container ๐
- Explode? ๐ฅ Catch the error, analyze the stack trace.
- Fix it. ๐ ๏ธ Rewrite the code.
- Repeat until it works or hits the retry limit.
The result? Code that actually runs. (โโ _โ )
We call it the Reflexion Loop. It's the secret sauce that bumps success rates from ~60% to 92%.
graph LR
A[User Task] --> B(Generate)
B --> C{Sandbox Execution}
C -->|โ
Success| D[Return Result]
C -->|โ Failure| E[Critique & Fix]
E --> B
style C fill:#ff9,stroke:#333,stroke-width:2px
style E fill:#f9f,stroke:#333,stroke-width:2px
It's not just one LLM. It's a council of specialized agents working in a peer-to-peer structure:
- ๐ฉ The Architect - Plans the structure.
- ๐ป The Coder - Writes the raw Python.
- ๐ง The Critic - Hunts for logic bugs.
- ๐ก๏ธ The Security - Ensures no shenanigans (rm -rf /).
Under the hood, this isn't just a wrapper. It's a full-blown runtime environment.
Every line of code runs inside an ephemeral Docker container.
- No Network Access: Code cannot call home or download malware.
[OFFLINE] - Resource Limits: Capped at 512MB RAM / 0.5 CPU. No fork bombs.
[CAPPED] - Timeouts: Hard cut-off at 5 seconds. No infinite loops.
[STRICT] - Ephemeral: Container dies immediately after execution. No persistence.
[CLEAN]
Switch intelligence providers instantly via .env. logic remains the same.
GROQ(Llama 3 70B) - Recommended for speed (750ms)OPENAI(GPT-4o) - Best for complex logicANTHROPIC(Claude 3.5 Sonnet) - Best for code qualityOLLAMA(DeepSeek Coder / Qwen) - 100% Local & Private
Uses graph-based state management to persist the conversation context and learning history during the reflection loop.
- Checkpointing: Resumes from last failed state.
- Reflection History: Remembers why previous 2 attempts failed.
- Structured Output: Enforced JSON schema for all internal communication.
| The Awakening (Swarm Init) ๐ | Code Alchemy (Generation) โ๏ธ |
|---|---|
![]() |
![]() |
| The Solution ๐ | Victory (Result) ๐ |
|---|---|
![]() |
![]() |
See the agent build a full snake game from scratch in under 30 seconds.
We put this runtime up against the giants. Here is the tale of the tape:
| Contender | Success Rate | Speed | Self-Healing? | Wallet Damage |
|---|---|---|---|---|
| Agent Sandbox ๐ฆ | 92% | ~743ms โก | YES | Free |
| GPT-4 Code Interpreter | 87% | ~3.2s | Yes | $$$ |
| Devin | 85% | ~45s | Yes | $$$$$ |
| Standard LLM API | ~40-60% | Variable | NO (T_T) |
$$ |
Validated on 12 complex algorithmic challenges ranging from Fibonacci sequences to custom data structure implementations.
Get up and running faster than you can say "Segmentational Fault".
docker run -e GROQ_API_KEY=your_key ghcr.io/ixchio/agent-sandbox-runtime# 1. Clone the Scroll
git clone https://github.com/ixchio/agent-sandbox-runtime.git
cd agent-sandbox-runtime
# 2. Summon Dependencies
pip install -e .
# 3. Configure Your Mana (API Keys)
cp .env.example .env
# (Add your key: GROQ_API_KEY, OPENAI_API_KEY, etc.)
# 4. Cast Spell
agent-sandbox run "Calculate the first 10 prime numbers"Adjust your runtime environment via .env or environment variables.
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
Choose your champion: groq, openai, anthropic, ollama |
groq |
MAX_REFLEXION_ATTEMPTS |
How many times to try fixing bugs before giving up? | 3 |
SANDBOX_TIMEOUT_SECONDS |
Max execution time (prevent infinite loops) | 5.0 |
We are building the future of agentic coding. Want to help? Check out CONTRIBUTING.md for the rules of engagement.
We love PRs! (๏พโใฎโ)๏พ*:๏ฝฅ๏พโง
Built with ๐ by the Open Source Community



