diff --git a/README.md b/README.md index 0a8a70a9..133a76f3 100644 --- a/README.md +++ b/README.md @@ -1,42 +1,39 @@ # Perstack +An open-source harness for micro-agents. +

+ License + npm downloads npm version Docker image version - npm downloads - License

Documentation · Getting Started · - Discord + Discord

-Agentic software in production hits two walls. Build from scratch, and most of the code is plumbing — not prompts. Throw a frontier model at a monolithic agent, and it works but doesn't scale commercially. - -Perstack takes a third approach: purpose-specific micro-agents on value-tier models, running in sandboxed containers. Narrow tasks don't need Opus-level reasoning. Haiku is enough. And single-purpose prompts are easy to write. +Perstack is an infrastructure for **practical agentic AI** that aims to: +- **Do big things with small models**: If a smaller model can make the same thing, there's no reason to use a bigger model. +- **Focus on what you know best**: Building agentic software that people use doesn't require AI science degree — knowledge to solve their problems is what matters. +- **Keep it simple and reliable**: The biggest mistake is cramming AI into an overly complex harness and ending up with an uncontrollable agent. -Every agent runs inside its own Docker container with an isolated filesystem and network. The host system is never exposed. This makes Perstack safe to run in production — agents can execute shell commands and write files without risk to the host. +## Getting Started -Agent definitions live in a `perstack.toml` file. `perstack` executes them on an event-sourced runtime inside a sandboxed container. +To get started, you can use the `create-expert` Expert that helps you to focus on the core and build your first agentic AI: ```bash -npx perstack start create-expert "Form a team named ai-gaming to build a Bun-based CLI indie game playable on Bash for AI." - -docker run --rm -e ANTHROPIC_API_KEY \ - -v ./perstack.toml:/workspace/perstack.toml:ro \ - -v ./workspace:/workspace/output \ - perstack/perstack start ai-gaming --model "haiku-4-5" "Make a Wizardry-like dungeon crawler." +# Ask `create-expert` to form a team named `ai-gaming` +docker run --pull always --rm -it \ + -e ANTHROPIC_API_KEY \ + -v ./ai-gaming:/workspace \ + perstack/perstack start create-expert \ + "Form a team named ai-gaming to build a Bun-based CLI cutting-edge indie game playable on Bash." ``` -The official [`perstack/perstack`](https://hub.docker.com/r/perstack/perstack) image ships pre-compiled standalone binaries — no Node.js or npm needed. File writes, shell commands, and all agent activity are confined to the container. - -A game built with these commands: [demo-dungeon-crawler](https://github.com/perstack-ai/demo-dungeon-crawler). Built entirely on Claude 4.5 Haiku. - -## How it works - -`create-expert` is a published Expert on the [Perstack registry](https://perstack.ai/). It generates a `perstack.toml` that defines a team of micro-agents. No local config file is needed — the CLI resolves `create-expert` from the registry automatically. Each agent has a single responsibility and its own context window. Complex tasks are broken down and delegated to specialists. +`create-expert` is a built-in Expert. It generates a `perstack.toml` that defines a team of micro-agents. Each agent has a single responsibility and its own context window. Complex tasks are broken down and delegated to specialists. ```toml [experts."ai-gaming"] @@ -57,57 +54,74 @@ description = "Tests the game and reports bugs" instruction = "Play-test the game, find bugs, and verify fixes." ``` -Run with Docker — the config is mounted read-only, and output stays inside the container: +To let your agents work on an actual task, you can use the `perstack start` command to run them interactively: ```bash -docker run --rm -e ANTHROPIC_API_KEY \ - -v ./perstack.toml:/workspace/perstack.toml:ro \ - -v ./output:/workspace/output \ - perstack/perstack run ai-gaming "Make a Wizardry-like dungeon crawler." +# Let `ai-gaming` team build a Wizardry-like dungeon crawler +docker run --pull always --rm -it \ + -e ANTHROPIC_API_KEY \ + -v ./ai-gaming:/workspace \ + perstack/perstack start ai-gaming \ + --model "haiku-4-5" \ + "Make a Wizardry-like dungeon crawler. Make it replayable, so players can dive in, die, and find a way to beat it." ``` -`perstack start` streams activity in real time. `perstack run` does the same headless with JSON output, for integration into your own applications. +Here is an example of a game built with these commands: [demo-dungeon-crawler](https://github.com/perstack-ai/demo-dungeon-crawler). It was built entirely on Claude 4.5 Haiku. + +## Prerequisites -## Sandboxed by default +- Docker +- An LLM provider API key (see [Providers](#providers)) -Agents have access to shell execution, file I/O, and MCP tools. Running them directly on the host is a security risk. Docker containers solve this: +### Giving API keys -- **Filesystem isolation** — agents read and write inside the container. Mount only what you need with `-v`. -- **Network control** — restrict outbound access with `--network=none` or custom Docker networks. -- **Resource limits** — cap CPU and memory with `--cpus` and `--memory`. -- **Disposable environments** — `--rm` ensures nothing persists after execution. +There are two ways to provide API keys: + +**1. Pass host environment variables with `-e`** + +Export the key on the host and forward it to the container: ```bash -# Maximum isolation: no network, resource-limited, read-only config -docker run --rm \ - --network=none \ - --cpus=1 --memory=2g \ +export ANTHROPIC_API_KEY=sk-ant-... +docker run --rm -it \ -e ANTHROPIC_API_KEY \ - -v ./perstack.toml:/workspace/perstack.toml:ro \ - perstack/perstack run my-expert "Analyze this codebase." + -v ./workspace:/workspace \ + perstack/perstack start my-expert "query" ``` -For production deployments, see [Isolation by Design](https://perstack.ai/docs/operating-experts/isolation-by-design/). +**2. Store keys in a `.env` file in the workspace** -## Runtime +Create a `.env` file in the workspace directory. Perstack loads `.env` and `.env.local` by default: -The runtime (`@perstack/runtime`) is event-sourced. Every execution produces a hierarchy of Jobs, Runs, and Checkpoints. +```bash +# ./workspace/.env +ANTHROPIC_API_KEY=sk-ant-... +``` -- **Resume** from any checkpoint if an execution is interrupted. -- **Replay** executions across different models to compare behavior. -- **Inspect** execution history with `perstack log`. +```bash +docker run --rm -it \ + -v ./workspace:/workspace \ + perstack/perstack start my-expert "query" +``` -Each agent runs in isolation with its own context window and tools. Agents communicate through shared workspace files, not shared conversation history. +You can also specify custom `.env` file paths with `--env-path`: + +```bash +perstack start my-expert "query" --env-path .env.production +``` -## Providers -Anthropic, OpenAI, Azure OpenAI, Google, Vertex AI, DeepSeek, Ollama, Bedrock. +## How it works -## Tools +Perstack organizes the complexity of micro-agents harness design into a simple stack model: -Agents access the outside world through MCP (Model Context Protocol). `@perstack/base` provides built-in tools: file read/write/edit, shell execution, image and PDF reading. +- **Definition**: `perstack.toml`, Experts, skills, providers +- **Context**: Context windows, workspace, delegations, inference budgets +- **Runtime**: Event-sourcing, checkpoints, skill management +- **Infrastructure**: Container isolation, workspace boundaries, env vars, secrets +- **Interface**: `perstack` CLI, JSON-event via `@perstack/runtime` -Custom MCP servers can be added per agent in `perstack.toml`. +For details, see [Understanding Perstack](https://perstack.ai/docs/understanding-perstack/concept/). ## Deployment @@ -123,13 +137,9 @@ docker build -t my-expert . docker run --rm -e ANTHROPIC_API_KEY my-expert "query" ``` -The image is multi-arch (`linux/amd64`, `linux/arm64`) and weighs ~74MB. Pin to a specific version for reproducible builds: - -```dockerfile -FROM perstack/perstack:0.0.94 -``` +The image is ubuntu-based, multi-arch (`linux/amd64`, `linux/arm64`) and weighs ~74MB. -The runtime can also be imported directly as a TypeScript library for serverless environments (Cloudflare Workers, Vercel, etc.): +The runtime can also be imported directly as a TypeScript library for serverless environments (Cloudflare Workers, Vercel, etc.) or integrated into your own applications: ```typescript import { run } from "@perstack/runtime" @@ -163,7 +173,7 @@ Pre-1.0. The runtime is stable and used in production, but the API surface may c ## Community -- [Discord](https://discord.gg/perstack) +- [Discord](https://discord.gg/2xZzrxC9) - [GitHub Issues](https://github.com/perstack-ai/perstack/issues) - [@FL4T_LiN3 on X](https://x.com/FL4T_LiN3) diff --git a/docs/guides/going-to-production.md b/docs/guides/going-to-production.md index 051c218a..9350c100 100644 --- a/docs/guides/going-to-production.md +++ b/docs/guides/going-to-production.md @@ -1,16 +1,16 @@ --- title: "Going to Production" -description: "Deploy your agent safely and reliably. Sandbox execution in containers with full observability." +description: "Deploy your Experts safely and reliably in containers with full observability." tags: ["deployment", "container", "production"] sidebar: order: 5 --- -Your agent works in development. Now you want to deploy it for real users. +Your Expert works in development. Now you want to deploy it for real users. **The concern**: AI agents have larger attack surfaces than typical applications. They make decisions, use tools, and interact with external systems. How do you deploy them safely? -**Perstack's approach**: Sandbox the environment, not the agent's capabilities. The runtime is designed to run in isolated containers with full observability. +**Perstack's approach**: The runtime is designed to run in isolated containers with full observability. Instead of restricting what the Expert can do, you contain the impact at the infrastructure level. ## The deployment model @@ -134,22 +134,22 @@ You get full execution traces without any instrumentation code. > [!NOTE] > Events are also written to `workspace/perstack/` as checkpoints. You can replay any execution for debugging or auditing. -## Why sandbox-first? +## Isolation model Traditional security approach: Restrict what the agent *can do* — limit tools, filter outputs, add guardrails inside the agent. **Problem**: This creates an arms race. The agent tries to be helpful; the restrictions try to prevent misuse. Complex, brittle, never complete. -**Sandbox-first approach**: Let the agent do its job. Contain the *impact* at the infrastructure level. +**Perstack's approach**: Let the Expert do its job. Contain the *impact* at the infrastructure level. -| Aspect | Traditional | Sandbox-first | +| Aspect | Traditional | Container isolation | | --------------- | ----------------------- | ---------------------------- | -| Tool access | Restricted, filtered | Full access within sandbox | +| Tool access | Restricted, filtered | Full access within container | | Output handling | Content filtering | Events to stdout, you decide | | Failure mode | Agent fights guardrails | Container terminates | | Audit | Logs + hope | Complete event stream | -The agent operates freely within its sandbox. Your infrastructure controls what the sandbox can affect. +The Expert operates freely within its container. Your infrastructure controls what the container can affect. ## Production checklist diff --git a/docs/understanding-perstack/concept.md b/docs/understanding-perstack/concept.md index dfbe6246..4e3eecf8 100644 --- a/docs/understanding-perstack/concept.md +++ b/docs/understanding-perstack/concept.md @@ -83,7 +83,7 @@ Observability means agent behavior is fully transparent and inspectable. - **Internal state visibility**: state machines emit visible events - **Deterministic history**: checkpoints make runs reproducible -This isn't just for debugging. Observability is a [prerequisite for sandbox-first security](./sandbox-integration.md#observability-as-a-prerequisite) — you verify behavior after the fact, not before. +This isn't just for debugging. Observability is a [prerequisite for sandbox integration](./sandbox-integration.md#observability-as-a-prerequisite) — you verify behavior after the fact, not before. ### Reusability diff --git a/docs/understanding-perstack/sandbox-integration.md b/docs/understanding-perstack/sandbox-integration.md index c779a4c0..5ac5309c 100644 --- a/docs/understanding-perstack/sandbox-integration.md +++ b/docs/understanding-perstack/sandbox-integration.md @@ -4,7 +4,7 @@ sidebar: order: 5 --- -## Why sandbox-first? +## Why sandbox integration? AI agents differ fundamentally from traditional software. The same input can produce different outputs. Model updates can change behavior. Hallucinations can trigger destructive actions without any attacker involved. @@ -15,7 +15,7 @@ This creates a security challenge with two possible approaches: | Restrict the agent | Limit tools, actions, or decisions — defeats the purpose | | Sandbox the environment | Full capability inside isolated boundaries | -Perstack takes the sandbox-first approach. The runtime doesn't enforce its own security layer — it's designed to run inside infrastructure that provides isolation. +Perstack's runtime is designed to run inside infrastructure that provides isolation — it delegates security boundaries to the platform rather than enforcing its own. ### The boundary layer model @@ -32,9 +32,9 @@ Agent security can be understood as four boundary layers: Traditional software security relies on input validation and output filtering — detect threats before they execute. For AI agents, this approach hits a fundamental limit: distinguishing legitimate instructions from malicious ones requires interpreting intent, which is close to an undecidable problem. -Sandbox-first inverts this model. Instead of trying to prevent all bad outcomes upfront, you let the agent operate freely within boundaries and verify behavior after the fact. This only works if every action is visible and auditable. +Sandboxing inverts this model. Instead of trying to prevent all bad outcomes upfront, you let the agent operate freely within boundaries and verify behavior after the fact. This only works if every action is visible and auditable. -This is why **Observability** is one of Perstack's three core principles (alongside Isolation and Reusability). It's not a nice-to-have — it's a prerequisite for the sandbox-first approach to work. Full event history, deterministic checkpoints, and transparent execution make post-hoc verification possible. +This is why **Observability** is one of Perstack's three core principles (alongside Isolation and Reusability). It's not a nice-to-have — it's a prerequisite for sandbox integration to work. Full event history, deterministic checkpoints, and transparent execution make post-hoc verification possible. ### The messaging problem