diff --git a/docs/guides/rapid-prototyping.md b/docs/guides/rapid-prototyping.md index e9767833..0b9188b0 100644 --- a/docs/guides/rapid-prototyping.md +++ b/docs/guides/rapid-prototyping.md @@ -1,146 +1,133 @@ --- -title: "Rapid Prototyping" -description: "Validate your AI agent idea without writing code. Define in TOML, run immediately, iterate fast." -tags: ["beginner", "quick-start"] +title: "Prototyping for Agent-First Apps" +description: "Start with the agent, not the app. Use create-expert to go from idea to working agent in seconds, then expand to tools and applications." +tags: ["beginner", "quick-start", "create-expert"] order: 1 --- -# Rapid Prototyping +# Prototyping for Agent-First Apps -You have an idea for an AI agent. Maybe it's a customer support bot, a research assistant, or a code reviewer. You want to test whether the idea works — before investing in a full implementation. +When you build an agent-powered app, the instinct is to start with the app — set up a project, install dependencies, write scaffolding. Then somewhere in the middle, you start figuring out what the agent should actually do. -**The problem**: Most agent frameworks require you to write code, set up infrastructure, and wire everything together. By the time you can test your idea, you've already invested significant effort. +This is backwards. -**With Perstack**: Define your agent in a text file. Run it immediately. Iterate on the prompt until it works. No code required. +**Agent-first means starting with the agent.** Get the brain working first. Once the agent behaves the way you want, expand outward: add tools, then build the shell around it. The agent is the product — everything else is infrastructure. -## From idea to running agent +This matters because the agent will keep evolving. Prompts change, capabilities expand, behavior gets refined. If the agent is tangled with your application code, every change risks breaking something unrelated. Keep the brain separate from the body, and both can evolve on their own terms. -Let's say you want to build a meeting summarizer that extracts action items from transcripts. +This guide uses [Perstack](https://perstack.ai) — a toolkit for agent-first development. In Perstack, agents are called **Experts**: modular micro-agents defined in plain text (`perstack.toml`), executed by a runtime that handles model access, tool orchestration, and state management. Perstack supports multiple LLM providers including Anthropic, OpenAI, and Google. You define what the agent should do; the runtime makes it work. -Create a file called `perstack.toml`: +> [!NOTE] +> **Prerequisites:** Node.js 22+ and an LLM API key. +> ```bash +> export ANTHROPIC_API_KEY=sk-ant-... +> ``` + +## What an Expert looks like + +An Expert is defined in a `perstack.toml` file: ```toml -[experts."meeting-summarizer"] -description = "Summarizes meeting transcripts and extracts action items" +[experts."reviewer"] +description = "Reviews code for security issues" instruction = """ -You are a meeting analyst. When given a meeting transcript: -1. Identify the main topics discussed -2. Extract all action items with owners and deadlines -3. Note any decisions made -4. Flag any unresolved issues - -Format output as markdown with clear sections. +You are a security-focused code reviewer. +Check for SQL injection, XSS, and authentication bypass. +Explain each finding with a severity rating and a suggested fix. """ ``` -Run it: +That's the entire definition. No SDK, no boilerplate, no orchestration code. Run it immediately: ```bash -npx perstack start meeting-summarizer "Here's the transcript from today's standup..." +npx perstack start reviewer "Review this login handler" ``` -That's it. Your agent is running. Watch how it responds, then adjust the instruction until it behaves the way you want. +`perstack start` opens a text-based interactive UI where you can watch the Expert reason and act in real time. -> [!NOTE] -> Use `perstack start` during prototyping — the interactive UI shows you exactly what the agent is thinking and doing. +## From idea to agent in one command -## Write prompts, not code +Writing TOML by hand works, but there's a faster way. [`create-expert`](https://www.npmjs.com/package/create-expert) is a CLI that generates Expert definitions from natural language descriptions — it's itself an Expert that builds other Experts. -Your goal isn't to write code. It's to get a valuable agent running. +```bash +npx create-expert "A code review assistant that checks for security vulnerabilities, suggests fixes, and explains the reasoning behind each finding" +``` -Every hour spent on boilerplate — setting up SDKs, configuring API clients, building orchestration logic — is an hour not spent on what actually matters: defining what your agent should do and testing whether it works. +`create-expert` takes your description, generates a `perstack.toml`, test-runs the Expert against sample inputs, and iterates on the definition until behavior stabilizes. You get a working Expert — no code, no setup. -Anthropic's research on building effective agents makes this clear: the most capable agents aren't the most complex ones. They're the ones that give the model exactly what it needs and stay out of the way. Complex orchestration frameworks and multi-step pipelines are often unnecessary — the model itself is increasingly capable when given clear instructions. +The description doesn't need to be precise. Start vague: -This is why Perstack uses plain text definitions: +```bash +npx create-expert "Something that helps with onboarding new team members" +``` -| You focus on | Perstack handles | -| ------------------------ | -------------------------------- | -| What the agent should do | Model access, tool orchestration | -| Domain knowledge | State management, checkpoints | -| Success criteria | MCP server lifecycle | +`create-expert` will interpret your intent, make decisions about scope and behavior, and produce a testable Expert. You can always refine from there. -You write the instruction. The runtime does the rest. When you need more — tools, delegation, production deployment — you add it declaratively. No code changes, no infrastructure work. +## Iterate by talking -The result: you spend your time on the hard problem (making the agent useful) instead of the solved problem (making it run). +`create-expert` reads the existing `perstack.toml` in your current directory. Run it again with a refinement instruction, and it modifies the definition in place: -## Make your agent better +```bash +npx create-expert "Make it more concise. It's too verbose when explaining findings" +``` -Once your agent is running, the real work begins: making it good. +```bash +npx create-expert "Add a severity rating to each finding: critical, warning, or info" +``` -### Iterate fast +```bash +npx create-expert "Run 10 tests with different code samples and show me the results" +``` -The power of this workflow is rapid iteration: +Each iteration refines the definition. The Expert gets better, and you never open an editor. -1. **Start minimal** — Write the smallest instruction that captures your intent -2. **Test with real input** — Use actual data, not toy examples -3. **Observe failures** — Watch where the agent gets confused -4. **Refine** — Add constraints or clarifications -5. **Repeat** — Until the agent handles your cases reliably +## Test with real scenarios -Each cycle takes seconds. Change the TOML, run again, observe results. +Prototyping isn't just about getting the agent to run — it's about finding where it fails. -### Add tools with MCP Skills +Write a test case that your agent should catch. For the code reviewer, create a file with a deliberate vulnerability: -Your agent can reason, but sometimes it needs to act — search the web, query a database, call an API. +```bash +npx create-expert "Read the file test/vulnerable.py and review it. It contains a SQL injection — make sure the reviewer catches it and suggests a parameterized query fix" +``` -```toml -[experts."researcher"] -description = "Researches topics using web search" -instruction = """ -Search the web to find accurate, up-to-date information. -Always cite your sources. -""" +If the reviewer misses it, you've found a gap in the instruction. Refine and test again: -[experts."researcher".skills."web"] -type = "mcpStdioSkill" -command = "npx" -packageName = "exa-mcp-server" -requiredEnv = ["EXA_API_KEY"] +```bash +npx create-expert "The reviewer missed the SQL injection in the raw query on line 12. Update the instruction to pay closer attention to string concatenation in SQL statements" ``` -The runtime handles package resolution, server lifecycle, and tool registration. You just declare what you need. +This is the feedback loop that matters: **write a scenario the agent should handle, test it, fix the instruction when it fails, repeat.** By the time you build the app around it, you already know what the agent can and can't do. -See [Extending with Tools](./extending-with-tools.md) for more. +## Evaluate with others -### Split responsibilities with delegation +At some point you need feedback beyond your own testing. `perstack start` makes this easy — hand someone the `perstack.toml` and they can run the Expert themselves: -When your instruction grows too long, split into multiple Experts: - -```toml -[experts."support"] -description = "Routes customer inquiries to specialists" -instruction = "Understand what the customer needs, then delegate to the right specialist." -delegates = ["product-expert", "billing-expert"] - -[experts."product-expert"] -description = "Answers product questions: specs, inventory, compatibility" -instruction = "Help customers find the right product." - -[experts."billing-expert"] -description = "Handles billing: invoices, payments, subscriptions" -instruction = "Help with invoice questions and payment methods." +```bash +npx perstack start reviewer ``` -Each Expert stays focused. The coordinator decides who to call. +The interactive UI lets them try their own queries and see how the Expert responds. No app to deploy, no environment to configure beyond the API key. -See [Taming Prompt Sprawl](./taming-prompt-sprawl.md) for the full pattern. +Every execution is recorded as checkpoints in the local `perstack/` directory. After a round of feedback, inspect what happened: -### Write effective instructions +```bash +npx perstack log +npx perstack log --tools # what tools were called +npx perstack log --errors # what went wrong +``` -How you write your instruction matters. Aim for the right level of specificity — not too rigid, not too vague. +You can review specific runs, filter by step, or export as JSON for deeper analysis. See the [CLI Reference](../references/cli.md) for the full set of options. -See [Best Practices](../making-experts/best-practices.md) for guidelines on writing effective instructions, structuring your Experts, and common pitfalls to avoid. +This gives you a lightweight evaluation workflow: distribute the TOML, collect usage, analyze the logs, refine the instruction. ## When your prototype grows -At some point, your prototype will need more: - -- **External tools**: The agent needs to search the web, query a database, or call an API → See [Extending with Tools](./extending-with-tools.md) -- **Multiple responsibilities**: The prompt is getting long and the agent is getting confused → See [Taming Prompt Sprawl](./taming-prompt-sprawl.md) -- **Production deployment**: The prototype works and you want to ship it → See [Going to Production](./going-to-production.md) +At some point, your prototype will need more. The same `perstack.toml` scales — you're not throwing away work. -The same TOML definition scales from prototype to production. You're not throwing away work — you're building on it. +- **The agent needs tools** — search the web, query a database, call an API → [Extending with Tools](./extending-with-tools.md) +- **The prompt is getting long** — split into multiple Experts that collaborate → [Taming Prompt Sprawl](./taming-prompt-sprawl.md) +- **The prototype works** — embed it into your application → [Adding AI to Your App](./adding-ai-to-your-app.md) ## What's next