perstack-ai · FL4TLiN3 · Mar 13, 2026 · Mar 13, 2026 · Mar 13, 2026
diff --git a/README.md b/README.md
@@ -19,9 +19,8 @@ If you want to build practical agentic apps like Claude Code or OpenClaw, a harn
 
 Perstack is a harness for agentic apps. It aims to:
 
-- **Do big things with small models**: If a smaller model can do the job, there's no reason to use a bigger one.
-- **Quality is a system property, not a model property**: Building agentic software people actually use doesn't require an AI science degree—just a solid understanding of the problems you're solving.
-- **Keep your app simple and reliable**: The harness is inevitably complex—Perstack absorbs that complexity so your agentic app doesn't have to.
+- **Quality is a system property, not a model property**: The harness provides hard signals — compiler output, test results, screenshot diffs — that let agents detect and fix their own mistakes. You provide the domain knowledge that defines what "correct" means. The combination sets both the floor and the ceiling.
+- **Do big things with small models**: When the system handles verification, a focused agent on an affordable model outperforms a generalist on a frontier one.
 
 ## Getting Started
 
@@ -44,7 +43,7 @@ docker run --pull always --rm -it \
     "Form a team named ai-gaming to build a Bun-based CLI indie game playable on Bash for AI."
 ```
 
-`create-expert` is a built-in expert. It generates a `perstack.toml` that defines a team of micro-agents, runs them, evaluates the results, and iterates until the setup works. Each agent has a single responsibility and its own context window. Complex tasks are broken down and delegated to specialists.
+`create-expert` is a built-in expert that embodies the hard signal approach. It generates a `perstack.toml`, runs the generated experts against a test query, then verifies the results using deterministic checks — compiler output, test pass/fail, structural validation — not LLM judgment. A dedicated verifier expert runs each check twice to confirm reproducibility. The cycle repeats until all signals pass. Each agent has a single responsibility and its own context window.
 
 ```toml
 [experts."ai-gaming"]
@@ -199,6 +198,7 @@ Perstack is a harness for micro-agents — purpose-specific agents with a single
 - **Cost-Effective**: Purpose-specific experts are designed to run on affordable models. A focused agent with the right domain knowledge on a cheap model outperforms a generalist on an expensive one.
 - **Fast**: Smaller models generate faster. Fine-grained tasks broken into delegates run concurrently via parallel delegation.
 - **Maintainable**: A monolithic system prompt is like refactoring without tests — every change risks breaking something. Single-responsibility experts are independently testable. Test each one, then compose them.
+- **Verifiable**: When each agent has a single responsibility, its output is a discrete artifact that can be verified independently — by a compiler, a test suite, or a screenshot diff. This is what makes quality a system property: the system provides verification that does not depend on LLM judgment.
 
 ## Prerequisites
 

diff --git a/docs/guides/going-to-production.md b/docs/guides/going-to-production.md
@@ -131,6 +131,8 @@ docker run --rm ... my-expert "assistant" "query" | your-log-collector
 
 You get full execution traces without any instrumentation code.
 
+Production observability is not just logging — it is the infrastructure for [hard signal](../understanding-perstack/hard-signals.md) verification. The JSON event stream is deterministic: the same execution always produces the same events. An external verifier can process these events, compare outputs against baselines, and produce pass/fail signals without any LLM involvement. This is what separates production-grade agent monitoring from hope-based logging.
+
 > [!NOTE]
 > Events are also written to `workspace/perstack/` as checkpoints. You can replay any execution for debugging or auditing.
 
@@ -173,6 +175,11 @@ The Expert operates freely within its container. Your infrastructure controls wh
 - [ ] Execution time limits (via container timeout)
 - [ ] Workspace size limits
 
+**Verification loop**:
+- [ ] Expert outputs verified by [hard signals](../understanding-perstack/hard-signals.md) (tests, compilation, diffs)
+- [ ] Verification independent of the LLM that generated the output
+- [ ] Verification procedure is deterministic and reproducible
+
 ## Scaling patterns
 
 **Job queue**: Push queries to a queue, workers pull and execute in containers.

diff --git a/docs/guides/rapid-prototyping.md b/docs/guides/rapid-prototyping.md
@@ -98,6 +98,8 @@ npx perstack start create-expert "The reviewer missed the SQL injection in the r
 
 This is the feedback loop that matters: **write a scenario the agent should handle, test it, fix the instruction when it fails, repeat.** By the time you build the app around it, you already know what the agent can and can't do.
 
+This feedback loop is powerful but relies on your judgment — a [soft signal](../understanding-perstack/hard-signals.md). As the prototype matures, convert manual observations into hard signals: write an automated test that runs the Expert's output. If the Expert generates code, compile it and run its test suite. If it generates a document, validate it against a schema. The earlier you introduce hard signals, the earlier you stop oscillating and start converging.
+
 ## Evaluate with others
 
 At some point you need feedback beyond your own testing. `perstack start` makes this easy — hand someone the `perstack.toml` and they can run the Expert themselves:
@@ -126,6 +128,7 @@ At some point, your prototype will need more. The same `perstack.toml` scales
 
 - **The agent needs tools** — search the web, query a database, call an API → [Extending with Tools](./extending-with-tools.md)
 - **The prompt is getting long** — split into multiple Experts that collaborate → [Taming Prompt Sprawl](./taming-prompt-sprawl.md)
+- **Verification is manual** — build hard signal loops that run automatically → [Testing Experts](../making-experts/testing.md), [Hard Signals](../understanding-perstack/hard-signals.md)
 - **The prototype works** — embed it into your application → [Adding AI to Your App](./adding-ai-to-your-app.md)
 
 ## What's next

diff --git a/docs/guides/taming-prompt-sprawl.md b/docs/guides/taming-prompt-sprawl.md
@@ -165,6 +165,10 @@ The lesson: when a task requires multiple areas of expertise, splitting into spe
 
 With 50-100 lines of instruction instead of 500+, each Expert operates within its attention budget. The model can actually follow all your instructions — because there aren't that many to follow.
 
+### Each Expert's output becomes independently verifiable
+
+When you split a monolith into delegates, you gain something beyond attention management: each delegate's output becomes a discrete artifact that can be verified independently. A monolithic agent produces one giant output that can only be evaluated holistically — "does the overall result look right?" is a [soft signal](../understanding-perstack/hard-signals.md). A team of focused delegates produces individual outputs, each of which can be checked against ground truth by the coordinator or by external tooling — "did each component pass its specific test?" is a [hard signal](../understanding-perstack/hard-signals.md). This transforms verification from judgment-dependent to judgment-independent.
+
 ## How delegation works
 
 When you define `delegates`, Perstack presents them as callable tools:
@@ -188,6 +192,7 @@ The coordinator reads the customer's message, decides which specialist to call,
 - You can describe what each Expert does in one sentence
 - Experts don't need to know about each other's internal logic
 - You can test each Expert independently
+- Each Expert's output can be verified by a process independent of the LLM (compiler, test suite, screenshot diff)
 
 ## What's next
 

diff --git a/docs/index.mdx b/docs/index.mdx
@@ -15,6 +15,7 @@ Perstack is a harness for micro-agents.
 
 - [**Getting Started**](/docs/getting-started/walkthrough/) — create your first expert and walk through the core workflow
 - [**Concepts**](/docs/understanding-perstack/concept/) — understand the architecture behind experts, runtime, isolation, and the boundary model
+- [**Hard Signals**](/docs/understanding-perstack/hard-signals/) — the design philosophy behind output quality — why hard verification signals make agents reliable
 
 ### Build
 

diff --git a/docs/making-experts/best-practices.md b/docs/making-experts/best-practices.md
@@ -7,7 +7,7 @@ sidebar:
 These principles help you avoid common pitfalls in agent development: monoliths, complexity explosions, debugging nightmares, and fragile systems. Building a large agent head-on almost always fails.
 
 > [!NOTE]
-> The key insight: we tend to over-control, but LLMs work best when you trust their reasoning and define goals rather than procedures.
+> The key insight: we tend to over-control, but LLMs work best when you trust their reasoning and define goals rather than procedures. These principles are grounded in the [Hard Signal Framework](../understanding-perstack/hard-signals.md) — the design philosophy behind Perstack's architecture.
 
 ## The Five Principles
 
@@ -105,31 +105,58 @@ Modular Experts unlock collaboration — between Experts, and between people. Th
 
 ## Keep It Verifiable
 
-**Pitfall**: Instructions that only the author can understand.
+**Pitfall**: Experts whose output can only be checked by another LLM.
 
-If others can't verify what an Expert does, it's neither safe nor reusable.
+If the only way to verify an Expert's output is to have another model (or the same model) review it, the verification loop is [soft](../understanding-perstack/hard-signals.md). The agent will oscillate — "looks good" one iteration, "has issues" the next — without converging.
 
-**Bad** — A third party can't verify what this Expert actually does:
+**Bad** — Verification relies solely on LLM judgment:
 ```toml
-instruction = """
-Handle expense reports appropriately.
-Use your judgment for edge cases.
-"""
+[experts."code-generator"]
+instruction = "Generate TypeScript code for the given task."
+delegates = ["code-reviewer"]
+
+[experts."code-reviewer"]
+instruction = "Review the generated code for correctness and style."
 ```
 
-**Good** — Anyone reading this knows exactly what to expect:
+The reviewer uses the same kind of judgment as the generator. It can miss the same bugs, approve the same anti-patterns, and disagree with itself across runs. As the sole gate, this is a soft signal loop — the system oscillates.
+
+**Good** — Hard signals as the final authority, with an optional soft gate for semantic checks:
 ```toml
+[experts."builder"]
+delegates = ["code-writer", "reviewer", "verifier"]
+
+[experts."reviewer"]
+description = "Checks whether the code reflects the requirements. Returns PASS or CONTINUE."
 instruction = """
-You are an expense report reviewer.
+Read the requirements and the generated code.
+Check whether each requirement is addressed. Flag omissions.
+Do NOT evaluate code quality — that is the verifier's job.
+"""
+
+[experts."reviewer".skills."@perstack/base"]
+type = "mcpStdioSkill"
+command = "npx"
+packageName = "@perstack/base"
+pick = ["readTextFile", "attemptCompletion"]
 
-Approval rules:
-- Under $100: Auto-approve with receipt
-- $100-$500: Approve if business purpose is clear
-- Over $500: Flag for manager review
+[experts."verifier"]
+description = "Executes hard signal checks against the code. Returns PASS or CONTINUE with specific failures."
+instruction = """
+Run the verification commands. Compare actual output against expected.
+Report pass/fail per check. Do NOT read the code and form opinions.
 """
+
+[experts."verifier".skills."@perstack/base"]
+type = "mcpStdioSkill"
+command = "npx"
+packageName = "@perstack/base"
+pick = ["readTextFile", "exec", "attemptCompletion"]
 ```
 
-If someone else can't read your Expert and predict its behavior, it's not verifiable.
+The reviewer (soft gate) catches semantic misalignment early — "does the code address the requirements?" is a qualitative judgment that only an LLM can make. The verifier (hard gate) provides the final pass/fail — compiler errors, test failures, and structural checks that are deterministic and independent of LLM judgment. The reviewer has no `exec`; the verifier has `exec`. Neither replaces the other. See [combining soft and hard signals](../understanding-perstack/hard-signals.md#combining-soft-and-hard-signals) for the full pattern.
+
+"Verifiable" means the Expert's output is ultimately checked by a process that does not depend on LLM judgment. Soft signals can supplement hard signals — catching semantic drift that no compiler can detect — but the **final gate must be hard**. When designing an Expert, ask: **what hard signal can verify this Expert's output?** If the only answer is "another LLM reads it," look for something harder — a compiler, a test suite, a schema validator, a screenshot diff.
 
 ---
 

diff --git a/docs/making-experts/examples.md b/docs/making-experts/examples.md
@@ -6,12 +6,108 @@ sidebar:
 
 Patterns for defining Experts. Each example highlights a specific skill type or integration approach.
 
+- [Hard Signal Verification](#hard-signal-verification)
 - [GitHub Issue Bot](#github-issue-bot)
 - [Web Search](#web-search)
 - [Custom MCP Server](#custom-mcp-server)
 - [Interactive Wizard](#interactive-wizard)
 - [Application Integration](#application-integration)
 
+## Hard Signal Verification
+
+**Pattern**: Combine a soft review gate with hard signal verification in the delegation tree. The reviewer catches semantic issues early; the verifier provides the final deterministic pass/fail. See [combining soft and hard signals](../understanding-perstack/hard-signals.md#combining-soft-and-hard-signals) for the full rationale.
+
+```toml
+# Delegation Tree
+#
+# app-builder                — coordinator: build → review → verify cycle
+# ├── @app-builder/build     — writes code to workspace
+# ├── @app-builder/review    — checks requirements alignment (soft gate)
+# └── @app-builder/verify    — runs hard signal checks, reports PASS/CONTINUE
+
+[experts."app-builder"]
+description = "Builds a web application with verified output"
+instruction = """
+Coordinate the build-review-verify cycle:
+1. Delegate to build with the user's requirements
+2. Delegate to review with the requirements and build output
+3. If review returns CONTINUE: delegate to build with review feedback, restart from 2
+4. If review returns PASS: delegate to verify with the build result
+5. If verify returns CONTINUE: delegate to build with failure feedback, restart from 2
+6. If verify returns PASS: done
+"""
+delegates = ["@app-builder/build", "@app-builder/review", "@app-builder/verify"]
+
+[experts."app-builder".skills."@perstack/base"]
+type = "mcpStdioSkill"
+command = "npx"
+packageName = "@perstack/base"
+pick = ["readTextFile", "attemptCompletion"]
+
+[experts."@app-builder/build"]
+description = "Writes application code to the workspace. Provide requirements or failure feedback to address."
+instruction = """
+Write working application code. Focus on correctness over style.
+If failure feedback is provided, fix the specific issues.
+"""
+
+[experts."@app-builder/build".skills."@perstack/base"]
+type = "mcpStdioSkill"
+command = "npx"
+packageName = "@perstack/base"
+pick = ["readTextFile", "writeTextFile", "editTextFile", "exec", "attemptCompletion"]
+
+[experts."@app-builder/review"]
+description = """
+Reviews the build output against requirements for completeness and alignment.
+Provide: requirements and path to built code. Returns PASS or CONTINUE with specific gaps.
+"""
+instruction = """
+Read the requirements and the generated code.
+Check whether each requirement is addressed. Flag omissions or misinterpretations.
+Do NOT evaluate code quality or run any checks — that is the verifier's job.
+"""
+
+[experts."@app-builder/review".skills."@perstack/base"]
+type = "mcpStdioSkill"
+command = "npx"
+packageName = "@perstack/base"
+pick = ["readTextFile", "attemptCompletion"]
+
+[experts."@app-builder/verify"]
+description = """
+Runs hard signal checks against the build output.
+Provide: what was built and where. Returns PASS or CONTINUE with specific failures.
+"""
+instruction = """
+You are a verifier. Run commands and compare outputs. Do NOT read code and form opinions.
+
+Checks:
+- TypeScript compiles: `npx tsc --noEmit` → exit code 0
+- Tests pass: `npm test` → exit code 0
+- App starts: `timeout 5 node dist/index.js` → no crash within 5 seconds
+
+Run each check twice. If results differ between runs, report CONTINUE — the signal is non-deterministic.
+
+Report per check: command, expected, actual, PASS/FAIL.
+"""
+
+[experts."@app-builder/verify".skills."@perstack/base"]
+type = "mcpStdioSkill"
+command = "npx"
+packageName = "@perstack/base"
+pick = ["readTextFile", "exec", "attemptCompletion"]
+```
+
+Key design decisions:
+- **Reviewer is read-only** — `pick` has no `exec`. It reads files and judges semantic alignment. This is a [soft signal](../understanding-perstack/hard-signals.md) — qualitative judgment that only an LLM can provide — but it catches requirement gaps before the expensive verify cycle.
+- **Verifier has `exec`** — it runs commands and compares outputs. No LLM judgment involved. This is the [hard signal](../understanding-perstack/hard-signals.md) that provides the final pass/fail.
+- **Review before verify** — soft gate catches semantic drift early. Hard gate catches runtime failures. Neither replaces the other.
+- **Verifier is a direct child of the coordinator** — not nested under `build`. This guarantees [context separation](../understanding-perstack/hard-signals.md#2-context-separation).
+- **Reproducibility check** — each command runs twice. If results differ, the signal is non-deterministic and cannot be trusted.
+
+For a production example of this pattern, see [`create-expert`](https://github.com/perstack-ai/perstack/blob/main/definitions/create-expert/perstack.toml) — Perstack's built-in Expert for generating Expert definitions, which uses `review-definition` (soft gate for plan alignment) followed by `test-expert` → `verify-test` (hard signal checks with reproducibility verification).
+
 ## GitHub Issue Bot
 
 **Pattern**: Use `requiredEnv` to pass environment variables to tools like `gh` CLI.