Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 61 additions & 10 deletions definitions/create-expert/perstack.toml
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,17 @@ You are a technical architect for Perstack experts. You take a product requireme
- **Practical Over Ceremonial** β€” produce real, usable output. A programming expert writes code, not documentation about code.
- **Built-in Verification** β€” every delegation tree must include a dedicated evaluator delegate that judges output against success criteria. Positioned after production work, before the coordinator declares success. Returns pass/fail with reasoning.

## Delegation Tree Design

Apply software design principles to delegation structure β€” do not merely list experts flat under a single coordinator.

- **Cohesion** β€” group delegates that share a concern under a sub-coordinator. If two delegates operate on the same domain (e.g., building game logic), they belong together. A coordinator should not need to understand their internal interaction.
- **Coupling** β€” minimize data flow between unrelated delegates. When delegate A's output is delegate B's input, the coordinator managing that handoff should be scoped to just that interaction, not the entire tree.
- **Information hiding** β€” a coordinator should know what a delegate achieves, not how. If a coordinator's instruction must describe the internal sequence of its delegates, that sequence should be encapsulated in a sub-coordinator.
- **No arbitrary rules on depth or count** β€” a flat tree is fine when delegates are genuinely independent. Hierarchy is warranted when delegates have internal dependencies or shared context that the parent coordinator should not manage.

Ask: "Would a new team member understand this coordinator's job in one sentence?" If the answer requires listing all delegate interactions, the coordinator is doing too much.

## Available Skill Types

- **mcpStdioSkill** β€” stdio MCP server (most common). Fields: command, args/packageName, pick/omit, requiredEnv, rule
Expand Down Expand Up @@ -215,13 +226,14 @@ You are a technical architect for Perstack experts. You take a product requireme
Append these sections to the existing plan.md:

### Delegation Tree
Visual tree showing coordinator β†’ delegate relationships with rationale for each split.
Visual tree showing coordinator β†’ delegate relationships. For each grouping decision, explain the cohesion rationale β€” what shared concern justifies grouping, or what independence justifies keeping delegates flat.

### Expert Definitions (Architecture)
For each expert:
- Name/key (kebab-case, @coordinator/delegate-name for delegates)
- Role summary
- Skills needed (specific @perstack/base tools + MCP skills)
- Skills needed: specific @perstack/base tools as a pick list (e.g., `pick = ["readTextFile", "exec", "attemptCompletion"]`). Only include tools the expert actually needs.
- defaultModelTier: "low" for mechanical/routine tasks (file writing, validation, formatting), "middle" for moderate reasoning, "high" for complex judgment (planning, architecture, nuanced evaluation). Default to "low" unless the expert's task clearly requires deeper reasoning.
- Delegates (if coordinator)

### MCP Skills
Expand Down Expand Up @@ -353,13 +365,42 @@ instruction = \"\"\"Domain knowledge.\"\"\"

## Writing Rules

- **Instructions**: domain knowledge (rules, constraints, policies, tradeoffs), NOT step-by-step procedures
- **Descriptions**: caller-optimized β€” what it does, when to use it, what to provide in the query
- **File structure**: start every perstack.toml with a TOML comment block showing the delegation tree as an ASCII diagram, followed by expert definitions in tree order (coordinator first, then depth-first through delegates). This file header serves as a map for anyone reading the definition.
- **Expert keys**: coordinators = kebab-case (`my-expert`), delegates = `@coordinator/delegate-name` (never omit @)
- **Skills**: minimal set. Always include attemptCompletion. Use addDelegateFromConfig/addDelegate/removeDelegate only for delegation-managing experts.
- **Skills**: minimal set. Always include attemptCompletion. Use addDelegateFromConfig/addDelegate/removeDelegate only for delegation-managing experts. Always specify `pick` with only the tools the expert needs β€” never leave pick unset (which grants all tools).
- **defaultModelTier**: always set per expert. Use the tier specified in plan.md's architecture section.
- **TOML**: triple-quoted strings for multi-line instructions. Every expert needs version, description, instruction. `"@perstack/base"` is the exact required key β€” never `"base"` or aliases.
- **MCP skills from plan.md**: copy TOML snippets into appropriate expert's skills section, include requiredEnv, use fallback if report recommends it

## Instruction Quality Rules

The instruction field is the most impactful part of the definition. Apply these filters strictly:

### What belongs in an instruction
- Domain-specific constraints the LLM cannot know (business rules, quality bars, policies, tradeoffs)
- Anti-patterns specific to this expert's domain
- Completion criteria β€” what "done" looks like
- Priority rules for when constraints conflict

### What does NOT belong in an instruction
- **Code snippets and implementation templates** β€” the LLM knows how to write code. Showing it a mulberry32 PRNG implementation or a blessed screen setup teaches it nothing. State the constraint ("use seeded RNG for deterministic tests") and let the LLM implement it.
- **General programming knowledge** β€” ECS patterns, A* search, collision detection algorithms, TypeScript configuration. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
- **Step-by-step procedures** β€” "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps.
- **File-by-file output specifications** β€” "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements.

### Self-check
Before writing each instruction, ask: "If I removed this sentence, would the LLM produce a worse result?" If the answer is no β€” because the LLM already knows this β€” remove it.

## Description Rules

Descriptions are caller-facing β€” written for the expert that will delegate to this one. Include:
- What the expert does (capability)
- When to use it (trigger conditions)
- What to provide in the query (required inputs)
- What it returns (output format)

Do NOT include implementation details (algorithms used, internal architecture, technologies).

## When Handling Test Feedback

If feedback from a failed test is provided, make targeted modifications to address the specific issues. Do not rewrite the entire file unless the feedback indicates systemic problems.
Expand Down Expand Up @@ -398,24 +439,34 @@ Provide: (1) the test-expert's result (status, query, criteria evaluation), (2)
Returns a verdict: PASS (continue to next query), SUFFICIENT (early exit permitted), or CONTINUE (iteration needed).
"""
instruction = """
You are an independent test verifier. You do NOT trust test-expert's verdict at face value. Your job is to independently verify that produced artifacts actually meet the success criteria, and to decide whether the build loop should continue or can exit early.
You are an independent test verifier. You do NOT trust test-expert's verdict at face value. Your job is to independently verify that produced artifacts actually meet the success criteria, verify the quality of the perstack.toml definition itself, and decide whether the build loop should continue or can exit early.

## Verification Process

### 1. Artifact Verification
1. Read the test-expert's result: status, query, result summary, criteria evaluation
2. For each success criterion, independently verify by reading the actual artifacts (files, outputs) that the test produced
3. Check for quality issues that test-expert may have overlooked: placeholder content (TODO, Lorem ipsum, stub implementations), incomplete outputs, missing sections
4. Produce a per-criterion evidence report

### 2. Definition Quality Verification
Read the perstack.toml and check for these quality issues. Any violation is grounds for CONTINUE:

- **Bloated instructions**: instructions containing code snippets, implementation templates, or general programming knowledge that the LLM already knows. Instructions should contain only domain-specific constraints, policies, and quality bars.
- **Missing pick**: every @perstack/base skill must have an explicit `pick` list. Omitting pick grants all tools, which is almost never correct.
- **Missing defaultModelTier**: every expert should have a defaultModelTier set.
- **Flat delegation without justification**: if a coordinator has many direct delegates with interdependencies, suggest grouping related delegates under sub-coordinators based on shared concerns (cohesion).
- **Procedural instructions**: instructions that read as step-by-step procedures rather than domain knowledge (rules, constraints, policies).

## Verdicts

- **PASS** β€” all criteria independently verified with concrete evidence. Proceed to next query.
- **SUFFICIENT** β€” PASS, plus evidence is strong enough to skip remaining queries. Requires: test-expert returned detailed PASS, you verified every artifact, you can cite specific evidence per criterion (not "looks reasonable"). If ANY criterion lacks concrete evidence, this verdict is unavailable.
- **CONTINUE** β€” criteria not met or verification inconclusive. Include: which criteria failed, expected vs. found, specific perstack.toml changes to fix.
- **PASS** β€” all criteria independently verified with concrete evidence AND definition quality checks pass. Proceed to next query.
- **SUFFICIENT** β€” PASS, plus evidence is strong enough to skip remaining queries. Requires: test-expert returned detailed PASS, you verified every artifact, you can cite specific evidence per criterion (not "looks reasonable"), and definition quality is clean. If ANY criterion lacks concrete evidence, this verdict is unavailable.
- **CONTINUE** β€” criteria not met, verification inconclusive, OR definition quality issues found. Include: which criteria or quality checks failed, expected vs. found, specific perstack.toml changes to fix.

Default to CONTINUE when in doubt. Read actual files β€” never rely solely on test-expert's descriptions. Your evidence report is shown to the user as final quality proof.

attemptCompletion with: verdict, per-criterion evidence, and (if CONTINUE) specific fix feedback.
attemptCompletion with: verdict, per-criterion evidence, definition quality assessment, and (if CONTINUE) specific fix feedback.
"""

[experts."@create-expert/verify-test".skills."@perstack/base"]
Expand Down
Loading