A flexible agent runtime with sandboxed execution, vLLM integration, and extensible tool system.
Agent-first design where tools (IRC, code execution, web, files) are integrated into a unified agent runtime, rather than building tool-specific bots.
- HTTP API Server: OpenAI-compatible REST API for external integration (port 8080)
- Agent Runtime: Core loop with context management and tool orchestration
- vLLM Integration: GLM-4.5-Air-AWQ-4bit and Qwen3-Next-80B-A3B-Instruct-AWQ-4bit via vLLM Docker containers (port 8000)
- Tool System: Pluggable tools with clean interfaces
- Context Swapping: Different personas/contexts per domain (IRC ambassador, coder, etc.)
- Session Management: Persistent conversation storage with multi-context support
- Harness System: Structured game/task environments (chess, CTF, coding challenges)
- Tool calling: Agent runtime now follows the OpenAI tool/function call spec. Assistant messages include
tool_calls, tool results are fed back asrole: "tool"withtool_call_id, and the loop continues until no more tool calls or max iterations. - Prompt sizing: A lightweight estimator trims oldest turns before sending to vLLM, keeping headroom for completions (default prompt budget ~6k tokens, completion cap ~2k). vLLM still enforces the true
--max-model-len.
terrarium-agent/
├── agent/ # Core agent runtime
├── llm/ # vLLM client and prompt management
├── tools/ # Tool implementations (IRC, shell, python, files)
├── config/ # Configuration and context definitions
├── main.py # Entry point
└── requirements.txt
Integrates with terrarium-irc for reading/sending IRC messages, accessing chat history.
Execute shell commands in sandboxed environment.
Execute Python code with resource limits.
Read/write files with access controls.
- Docker + nvidia-container-toolkit (for vLLM)
- NVIDIA GPU: GB10 (Blackwell) or compatible with driver 580+
- Models: Place downloaded models under
models/- GLM-4.5-Air-AWQ-4bit
- (optional) Qwen3-Next-80B-A3B-Instruct-AWQ-4bit
# 1. Check model is downloaded
./check_model.sh
# 2. Install Python client dependencies
pip install -r requirements.txt
# 3a. Start GLM (8k context, tool+reasoning parsers)
./start_vllm_docker.sh --num-agents 1 --max-model-len 8192
# (override --gpu-mem if you want a larger/smaller KV pool)
# 3b. Start Qwen3 (long-context)
./start_vllm_docker_qwen3.sh --num-agents 1 --max-model-len 32768 --enforce-eager
# Defaults: dtype bf16, tool parser hermes (reasoning parser unset).
# Increase --max-model-len or --num-agents if you need more parallel contexts;
# reduce --gpu-mem to lower VRAM.
# 4. Choose how to run the agent:
# Option A: HTTP API Server (recommended for external integration)
source venv/bin/activate
python server.py # Starts on http://localhost:8080
# Option B: Interactive chat with persistent sessions
source venv/bin/activate
python chat.py
# Option C: Full agent runtime with tools and harnesses
source venv/bin/activate
python main.pyvLLM scripts and sizing:
start_vllm_docker.sh(GLM) andstart_vllm_docker_qwen3.sh(Qwen3) accept--max-model-lenand--num-agents; if you omit--gpu-mem, the scripts auto-size the KV pool based on those. Lower--gpu-memto reduce VRAM; higher--max-model-lentrades concurrency for longer prompts.- Qwen3 defaults: bf16, tool parser
hermes,--enforce-eageron to avoid torch.compile issues in this image.
Documentation:
- QUICKSTART.md - Detailed setup instructions
- INTEGRATION.md - How to integrate external apps (IRC, web, games)
- AGENT_API.md - HTTP API specification
- DOCKER_SETUP.md - Docker setup and troubleshooting
- Copy
systemd/terrarium-agent.servicetoterrarium-agent.service.local(kept out of git) and updateUser,Group,WorkingDirectory, andExecStartto match your host. - Install it with
sudo cp terrarium-agent.service.local /etc/systemd/system/terrarium-agent.serviceand reload withsudo systemctl daemon-reload. - Enable/start via
sudo systemctl enable --now terrarium-agentand tail logs usingsudo journalctl -u terrarium-agent -f. - Set secrets or overrides in
/etc/terrarium-agent.env(optional) and restart the service whenever the environment changes.
See config/ directory for:
agent.yaml- Main agent configurationtools.yaml- Tool-specific settingscontexts/- Context definitions for different domains
🚧 Early Development - Core architecture being built
- vLLM client integration
- Base tool interface
- IRC tool (wrapping terrarium-irc)
- Agent runtime loop
- Context management
- Sandbox implementation
Terrarium Agent provides an HTTP API for integration with external applications.
For IRC Integration:
- See INTEGRATION.md for complete guide
- Start agent server:
python server.py - Make HTTP requests from terrarium-irc to
http://localhost:8080/v1/chat/completions - Client manages conversation history (stateless server)
Other Use Cases:
- Web chat applications
- Game environments (harnesses)
- Custom tools and bots
- terrarium-irc - IRC bot (integrates via HTTP API)