Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 68 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,39 @@
# Perstack

An open-source harness for micro-agents.

<p align="center">
<a href="https://github.com/perstack-ai/perstack/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue" alt="License"></a>
<a href="https://www.npmjs.com/package/perstack"><img src="https://img.shields.io/npm/dm/perstack" alt="npm downloads"></a>
<a href="https://www.npmjs.com/package/perstack"><img src="https://img.shields.io/npm/v/perstack" alt="npm version"></a>
<a href="https://hub.docker.com/r/perstack/perstack"><img src="https://img.shields.io/docker/v/perstack/perstack?label=docker" alt="Docker image version"></a>
<a href="https://www.npmjs.com/package/perstack"><img src="https://img.shields.io/npm/dm/perstack" alt="npm downloads"></a>
<a href="https://github.com/perstack-ai/perstack/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue" alt="License"></a>
</p>

<p align="center">
<a href="https://perstack.ai"><strong>Documentation</strong></a> ·
<a href="https://perstack.ai/docs/getting-started/"><strong>Getting Started</strong></a> ·
<a href="https://discord.gg/perstack"><strong>Discord</strong></a>
<a href="https://discord.gg/2xZzrxC9"><strong>Discord</strong></a>
</p>

Agentic software in production hits two walls. Build from scratch, and most of the code is plumbing — not prompts. Throw a frontier model at a monolithic agent, and it works but doesn't scale commercially.

Perstack takes a third approach: purpose-specific micro-agents on value-tier models, running in sandboxed containers. Narrow tasks don't need Opus-level reasoning. Haiku is enough. And single-purpose prompts are easy to write.
Perstack is an infrastructure for **practical agentic AI** that aims to:
- **Do big things with small models**: If a smaller model can make the same thing, there's no reason to use a bigger model.
- **Focus on what you know best**: Building agentic software that people use doesn't require AI science degree — knowledge to solve their problems is what matters.
- **Keep it simple and reliable**: The biggest mistake is cramming AI into an overly complex harness and ending up with an uncontrollable agent.

Every agent runs inside its own Docker container with an isolated filesystem and network. The host system is never exposed. This makes Perstack safe to run in production — agents can execute shell commands and write files without risk to the host.
## Getting Started

Agent definitions live in a `perstack.toml` file. `perstack` executes them on an event-sourced runtime inside a sandboxed container.
To get started, you can use the `create-expert` Expert that helps you to focus on the core and build your first agentic AI:

```bash
npx perstack start create-expert "Form a team named ai-gaming to build a Bun-based CLI indie game playable on Bash for AI."

docker run --rm -e ANTHROPIC_API_KEY \
-v ./perstack.toml:/workspace/perstack.toml:ro \
-v ./workspace:/workspace/output \
perstack/perstack start ai-gaming --model "haiku-4-5" "Make a Wizardry-like dungeon crawler."
# Ask `create-expert` to form a team named `ai-gaming`
docker run --pull always --rm -it \
-e ANTHROPIC_API_KEY \
-v ./ai-gaming:/workspace \
perstack/perstack start create-expert \
"Form a team named ai-gaming to build a Bun-based CLI cutting-edge indie game playable on Bash."
```

The official [`perstack/perstack`](https://hub.docker.com/r/perstack/perstack) image ships pre-compiled standalone binaries — no Node.js or npm needed. File writes, shell commands, and all agent activity are confined to the container.

A game built with these commands: [demo-dungeon-crawler](https://github.com/perstack-ai/demo-dungeon-crawler). Built entirely on Claude 4.5 Haiku.

## How it works

`create-expert` is a published Expert on the [Perstack registry](https://perstack.ai/). It generates a `perstack.toml` that defines a team of micro-agents. No local config file is needed — the CLI resolves `create-expert` from the registry automatically. Each agent has a single responsibility and its own context window. Complex tasks are broken down and delegated to specialists.
`create-expert` is a built-in Expert. It generates a `perstack.toml` that defines a team of micro-agents. Each agent has a single responsibility and its own context window. Complex tasks are broken down and delegated to specialists.

```toml
[experts."ai-gaming"]
Expand All @@ -57,57 +54,74 @@ description = "Tests the game and reports bugs"
instruction = "Play-test the game, find bugs, and verify fixes."
```

Run with Docker — the config is mounted read-only, and output stays inside the container:
To let your agents work on an actual task, you can use the `perstack start` command to run them interactively:

```bash
docker run --rm -e ANTHROPIC_API_KEY \
-v ./perstack.toml:/workspace/perstack.toml:ro \
-v ./output:/workspace/output \
perstack/perstack run ai-gaming "Make a Wizardry-like dungeon crawler."
# Let `ai-gaming` team build a Wizardry-like dungeon crawler
docker run --pull always --rm -it \
-e ANTHROPIC_API_KEY \
-v ./ai-gaming:/workspace \
perstack/perstack start ai-gaming \
--model "haiku-4-5" \
"Make a Wizardry-like dungeon crawler. Make it replayable, so players can dive in, die, and find a way to beat it."
```

`perstack start` streams activity in real time. `perstack run` does the same headless with JSON output, for integration into your own applications.
Here is an example of a game built with these commands: [demo-dungeon-crawler](https://github.com/perstack-ai/demo-dungeon-crawler). It was built entirely on Claude 4.5 Haiku.

## Prerequisites

## Sandboxed by default
- Docker
- An LLM provider API key (see [Providers](#providers))

Agents have access to shell execution, file I/O, and MCP tools. Running them directly on the host is a security risk. Docker containers solve this:
### Giving API keys

- **Filesystem isolation** — agents read and write inside the container. Mount only what you need with `-v`.
- **Network control** — restrict outbound access with `--network=none` or custom Docker networks.
- **Resource limits** — cap CPU and memory with `--cpus` and `--memory`.
- **Disposable environments** — `--rm` ensures nothing persists after execution.
There are two ways to provide API keys:

**1. Pass host environment variables with `-e`**

Export the key on the host and forward it to the container:

```bash
# Maximum isolation: no network, resource-limited, read-only config
docker run --rm \
--network=none \
--cpus=1 --memory=2g \
export ANTHROPIC_API_KEY=sk-ant-...
docker run --rm -it \
-e ANTHROPIC_API_KEY \
-v ./perstack.toml:/workspace/perstack.toml:ro \
perstack/perstack run my-expert "Analyze this codebase."
-v ./workspace:/workspace \
perstack/perstack start my-expert "query"
```

For production deployments, see [Isolation by Design](https://perstack.ai/docs/operating-experts/isolation-by-design/).
**2. Store keys in a `.env` file in the workspace**

## Runtime
Create a `.env` file in the workspace directory. Perstack loads `.env` and `.env.local` by default:

The runtime (`@perstack/runtime`) is event-sourced. Every execution produces a hierarchy of Jobs, Runs, and Checkpoints.
```bash
# ./workspace/.env
ANTHROPIC_API_KEY=sk-ant-...
```

- **Resume** from any checkpoint if an execution is interrupted.
- **Replay** executions across different models to compare behavior.
- **Inspect** execution history with `perstack log`.
```bash
docker run --rm -it \
-v ./workspace:/workspace \
perstack/perstack start my-expert "query"
```

Each agent runs in isolation with its own context window and tools. Agents communicate through shared workspace files, not shared conversation history.
You can also specify custom `.env` file paths with `--env-path`:

```bash
perstack start my-expert "query" --env-path .env.production
```

## Providers

Anthropic, OpenAI, Azure OpenAI, Google, Vertex AI, DeepSeek, Ollama, Bedrock.
## How it works

## Tools
Perstack organizes the complexity of micro-agents harness design into a simple stack model:

Agents access the outside world through MCP (Model Context Protocol). `@perstack/base` provides built-in tools: file read/write/edit, shell execution, image and PDF reading.
- **Definition**: `perstack.toml`, Experts, skills, providers
- **Context**: Context windows, workspace, delegations, inference budgets
- **Runtime**: Event-sourcing, checkpoints, skill management
- **Infrastructure**: Container isolation, workspace boundaries, env vars, secrets
- **Interface**: `perstack` CLI, JSON-event via `@perstack/runtime`

Custom MCP servers can be added per agent in `perstack.toml`.
For details, see [Understanding Perstack](https://perstack.ai/docs/understanding-perstack/concept/).

## Deployment

Expand All @@ -123,13 +137,9 @@ docker build -t my-expert .
docker run --rm -e ANTHROPIC_API_KEY my-expert "query"
```

The image is multi-arch (`linux/amd64`, `linux/arm64`) and weighs ~74MB. Pin to a specific version for reproducible builds:

```dockerfile
FROM perstack/perstack:0.0.94
```
The image is ubuntu-based, multi-arch (`linux/amd64`, `linux/arm64`) and weighs ~74MB.

The runtime can also be imported directly as a TypeScript library for serverless environments (Cloudflare Workers, Vercel, etc.):
The runtime can also be imported directly as a TypeScript library for serverless environments (Cloudflare Workers, Vercel, etc.) or integrated into your own applications:

```typescript
import { run } from "@perstack/runtime"
Expand Down Expand Up @@ -163,7 +173,7 @@ Pre-1.0. The runtime is stable and used in production, but the API surface may c

## Community

- [Discord](https://discord.gg/perstack)
- [Discord](https://discord.gg/2xZzrxC9)
- [GitHub Issues](https://github.com/perstack-ai/perstack/issues)
- [@FL4T_LiN3 on X](https://x.com/FL4T_LiN3)

Expand Down
16 changes: 8 additions & 8 deletions docs/guides/going-to-production.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
title: "Going to Production"
description: "Deploy your agent safely and reliably. Sandbox execution in containers with full observability."
description: "Deploy your Experts safely and reliably in containers with full observability."
tags: ["deployment", "container", "production"]
sidebar:
order: 5
---

Your agent works in development. Now you want to deploy it for real users.
Your Expert works in development. Now you want to deploy it for real users.

**The concern**: AI agents have larger attack surfaces than typical applications. They make decisions, use tools, and interact with external systems. How do you deploy them safely?

**Perstack's approach**: Sandbox the environment, not the agent's capabilities. The runtime is designed to run in isolated containers with full observability.
**Perstack's approach**: The runtime is designed to run in isolated containers with full observability. Instead of restricting what the Expert can do, you contain the impact at the infrastructure level.

## The deployment model

Expand Down Expand Up @@ -134,22 +134,22 @@ You get full execution traces without any instrumentation code.
> [!NOTE]
> Events are also written to `workspace/perstack/` as checkpoints. You can replay any execution for debugging or auditing.

## Why sandbox-first?
## Isolation model

Traditional security approach: Restrict what the agent *can do* — limit tools, filter outputs, add guardrails inside the agent.

**Problem**: This creates an arms race. The agent tries to be helpful; the restrictions try to prevent misuse. Complex, brittle, never complete.

**Sandbox-first approach**: Let the agent do its job. Contain the *impact* at the infrastructure level.
**Perstack's approach**: Let the Expert do its job. Contain the *impact* at the infrastructure level.

| Aspect | Traditional | Sandbox-first |
| Aspect | Traditional | Container isolation |
| --------------- | ----------------------- | ---------------------------- |
| Tool access | Restricted, filtered | Full access within sandbox |
| Tool access | Restricted, filtered | Full access within container |
| Output handling | Content filtering | Events to stdout, you decide |
| Failure mode | Agent fights guardrails | Container terminates |
| Audit | Logs + hope | Complete event stream |

The agent operates freely within its sandbox. Your infrastructure controls what the sandbox can affect.
The Expert operates freely within its container. Your infrastructure controls what the container can affect.

## Production checklist

Expand Down
2 changes: 1 addition & 1 deletion docs/understanding-perstack/concept.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Observability means agent behavior is fully transparent and inspectable.
- **Internal state visibility**: state machines emit visible events
- **Deterministic history**: checkpoints make runs reproducible

This isn't just for debugging. Observability is a [prerequisite for sandbox-first security](./sandbox-integration.md#observability-as-a-prerequisite) — you verify behavior after the fact, not before.
This isn't just for debugging. Observability is a [prerequisite for sandbox integration](./sandbox-integration.md#observability-as-a-prerequisite) — you verify behavior after the fact, not before.

### Reusability

Expand Down
8 changes: 4 additions & 4 deletions docs/understanding-perstack/sandbox-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar:
order: 5
---

## Why sandbox-first?
## Why sandbox integration?

AI agents differ fundamentally from traditional software. The same input can produce different outputs. Model updates can change behavior. Hallucinations can trigger destructive actions without any attacker involved.

Expand All @@ -15,7 +15,7 @@ This creates a security challenge with two possible approaches:
| Restrict the agent | Limit tools, actions, or decisions — defeats the purpose |
| Sandbox the environment | Full capability inside isolated boundaries |

Perstack takes the sandbox-first approach. The runtime doesn't enforce its own security layer — it's designed to run inside infrastructure that provides isolation.
Perstack's runtime is designed to run inside infrastructure that provides isolation — it delegates security boundaries to the platform rather than enforcing its own.

### The boundary layer model

Expand All @@ -32,9 +32,9 @@ Agent security can be understood as four boundary layers:

Traditional software security relies on input validation and output filtering — detect threats before they execute. For AI agents, this approach hits a fundamental limit: distinguishing legitimate instructions from malicious ones requires interpreting intent, which is close to an undecidable problem.

Sandbox-first inverts this model. Instead of trying to prevent all bad outcomes upfront, you let the agent operate freely within boundaries and verify behavior after the fact. This only works if every action is visible and auditable.
Sandboxing inverts this model. Instead of trying to prevent all bad outcomes upfront, you let the agent operate freely within boundaries and verify behavior after the fact. This only works if every action is visible and auditable.

This is why **Observability** is one of Perstack's three core principles (alongside Isolation and Reusability). It's not a nice-to-have — it's a prerequisite for the sandbox-first approach to work. Full event history, deterministic checkpoints, and transparent execution make post-hoc verification possible.
This is why **Observability** is one of Perstack's three core principles (alongside Isolation and Reusability). It's not a nice-to-have — it's a prerequisite for sandbox integration to work. Full event history, deterministic checkpoints, and transparent execution make post-hoc verification possible.

### The messaging problem

Expand Down