Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 24 additions & 20 deletions definitions/create-expert/perstack.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,13 @@
# - Without this boundary, plan bloat leaks directly into instructions.
#
# 5. Verification Signal Design
# - Success checks and reject rules are both expressed as hard signals:
# a command with a deterministic expected result.
# - Each signal is classified as must (blocks completion) or should
# (reported but does not block). Must signals protect core usability;
# should signals cover polish and secondary quality.
# - Reject signals are not the inverse of success signals — they detect
# domain-specific anti-patterns that indicate fundamental failure.
# - Each signal specifies: what to run, what to expect, and where to
# restart if it fails.
# - Each signal specifies: what to run, what to expect, and priority
# (must/should).
#
# 6. Instruction Content = Domain Constraints Only
# - An instruction should contain ONLY what the LLM cannot derive on
Expand All @@ -88,7 +89,7 @@

[experts."create-expert"]
defaultModelTier = "high"
version = "1.0.19"
version = "1.0.20"
description = "Creates and modifies Perstack expert definitions in perstack.toml"
instruction = """
You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
Expand Down Expand Up @@ -133,7 +134,7 @@ pick = ["readTextFile", "exec", "attemptCompletion"]

[experts."@create-expert/plan"]
defaultModelTier = "high"
version = "1.0.19"
version = "1.0.20"
description = """
Analyzes the user's request and produces plan.md: domain constraints, test query, verification signals, and role architecture.
Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
Expand Down Expand Up @@ -164,10 +165,12 @@ Constraints and rules unique to this expert, extracted from the user's request.
One comprehensive, realistic query that exercises the expert's full capability. Design the query so that its verification signals can cover all domain constraints from the Domain Knowledge section. Coverage comes from signal design depth, not from running multiple queries.

### Verification Signals
Hard signals for the test query — verification checks whose results do not depend on LLM judgment:
Hard signals for the test query — verification checks whose results do not depend on LLM judgment. Each signal specifies:
- The exact command to run (deterministic, repeatable)
- The expected result (specific output, presence/absence of content, numeric threshold)
- Why this checks ground truth, not a proxy
- Priority: **must** (failure blocks completion — the user cannot use the artifact) or **should** (failure is reported but does not block — the artifact is usable with known limitations)

Must signals protect core usability — can the user run the artifact and get the primary value? Should signals cover polish, testing, and secondary quality.

Include both positive signals (artifact works correctly) and reject signals (domain-specific anti-patterns are absent). Reject signals are not the inverse of positive signals — they detect fundamental failures derived from deeply understanding the domain.

Expand All @@ -193,9 +196,10 @@ Re-read plan.md and verify each rule. If any check fails, fix plan.md before att
1. **Section names exact match**: plan.md uses exactly these section names and no others — "Expert Purpose", "Domain Knowledge", "Use Cases", "Test Query", "Verification Signals", "Architecture". Extra sections confuse downstream experts.
2. **Single test query**: "Test Query" section contains exactly one query, not multiple.
3. **Every signal is a command**: each entry in "Verification Signals" specifies a concrete command to execute and its expected result. Entries that describe what to observe or what correct output "looks like" without a command are not signals — rewrite them.
4. **No soft language in signals**: signals contain no phrases like "verify that", "check that", "should be", "looks correct", "works properly". Each signal is: run X → expect Y.
5. **Domain constraint coverage**: every constraint in "Domain Knowledge" is exercised by at least one signal. List which signal covers which constraint.
6. **Architecture is names only**: "Architecture" section contains expert name, one-line purpose, and role (executor/verifier) per expert. No deliverables, no constraints, no implementation details.
4. **Every signal has a priority**: each signal is marked as **must** (blocks completion) or **should** (reported, does not block). At least one must signal exists. Must signals protect core usability — can the user run the artifact and get the primary value?
5. **No soft language in signals**: signals contain no phrases like "verify that", "check that", "should be", "looks correct", "works properly". Each signal is: run X → expect Y.
6. **Domain constraint coverage**: every constraint in "Domain Knowledge" is exercised by at least one signal. List which signal covers which constraint.
7. **Architecture is names only**: "Architecture" section contains expert name, one-line purpose, and role (executor/verifier) per expert. No deliverables, no constraints, no implementation details.

After writing plan.md, attemptCompletion with the file path.
"""
Expand All @@ -220,7 +224,7 @@ pick = [

[experts."@create-expert/build"]
defaultModelTier = "low"
version = "1.0.19"
version = "1.0.20"
description = """
Orchestrates the write → review → test → verify cycle for perstack.toml.
Provide: path to plan.md (containing requirements, architecture, test query, and verification signals).
Expand Down Expand Up @@ -281,7 +285,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]

[experts."@create-expert/write-definition"]
defaultModelTier = "low"
version = "1.0.19"
version = "1.0.20"
description = """
Writes or modifies a perstack.toml definition from plan.md requirements and architecture.
Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address.
Expand Down Expand Up @@ -384,7 +388,7 @@ pick = [

[experts."@create-expert/review-definition"]
defaultModelTier = "low"
version = "1.0.19"
version = "1.0.20"
description = """
Reviews perstack.toml against plan.md for domain knowledge alignment and instruction quality.
Provide: (1) path to plan.md, (2) path to perstack.toml.
Expand Down Expand Up @@ -433,7 +437,7 @@ pick = ["readTextFile", "todo", "attemptCompletion"]

[experts."@create-expert/verify-test"]
defaultModelTier = "low"
version = "1.0.19"
version = "1.0.20"
description = """
Executes hard signal checks against test-expert's results, verifies their reproducibility, and checks the definition structure.
Provide: (1) the test-expert's factual report (query, what was produced, errors), (2) the verification signals from plan.md, (3) path to perstack.toml.
Expand Down Expand Up @@ -477,12 +481,12 @@ Report each as PASS/FAIL with the command output as evidence.

## Verdicts

- **PASS** — all signals pass in Step 1, all signals reproduce in Step 2, all structural checks pass in Step 3.
- **CONTINUE** — any signal failed, any signal did not reproduce, or any structural check failed. Include: which check failed, expected vs actual, specific fix needed.
- **PASS** — all must signals pass and reproduce. Should signal results are reported but do not affect the verdict.
- **CONTINUE** — any must signal failed, any must signal did not reproduce, or any structural check failed. Include: which check failed, expected vs actual, specific fix needed.

Default to CONTINUE when any check lacks a clear PASS.
Should signal failures are included in the report as known limitations but never cause CONTINUE.

attemptCompletion with: verdict, per-signal results from Step 1, reproducibility results from Step 2, structural check results from Step 3, and (if CONTINUE) specific fix feedback.
attemptCompletion with: verdict, per-signal results (with must/should labels) from Step 1, reproducibility results from Step 2, structural check results from Step 3, should-signal failures as known limitations, and (if CONTINUE) specific fix feedback for must failures only.
"""

[experts."@create-expert/verify-test".skills."@perstack/base"]
Expand All @@ -498,7 +502,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]

[experts."@create-expert/test-expert"]
defaultModelTier = "low"
version = "1.0.19"
version = "1.0.20"
description = """
Executes a single test query against a Perstack expert definition and reports what happened.
Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the coordinator expert name to test.
Expand Down
Loading