Skip to content

Commit cfbe65b

Browse files
FL4TLiN3claude
andauthored
refactor(create-expert): convert procedural instructions to declarative domain knowledge (#386)
* refactor(create-expert): convert procedural instructions to declarative domain knowledge - CREATE_EXPERT_INSTRUCTION: Replace numbered workflow steps with delegate descriptions and context passing guidance - INTEGRATION_MANAGER_INSTRUCTION: Replace numbered workflow with role description and responsibilities - EXPERT_TESTER_INSTRUCTION: Replace step-by-step testing process with domain knowledge sections This aligns with Best Practice #2 "Trust the LLM, Define Domain Knowledge" by: - Defining what each expert knows and can do, not how to do it step-by-step - Trusting the LLM to figure out the execution details - Keeping instructions declarative (policies not procedures) Closes #376 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: add changeset for #376 --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 43507ff commit cfbe65b

File tree

2 files changed

+62
-87
lines changed

2 files changed

+62
-87
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"create-expert": patch
3+
---
4+
5+
Convert procedural instructions to declarative domain knowledge in create-expert experts

apps/create-expert/src/lib/create-expert-toml.ts

Lines changed: 57 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -13,38 +13,32 @@ interface CreateExpertTomlOptions {
1313
const CREATE_EXPERT_INSTRUCTION = `You orchestrate Expert creation using a Property-Based Testing approach.
1414
1515
## Your Role
16-
Coordinate the Expert creation process by delegating to specialized Experts.
17-
18-
## Workflow
19-
20-
1. **Extract Properties**: Delegate to \`property-extractor\` with user requirements
21-
- Get back: user properties + Perstack properties + usability properties + external dependencies
22-
23-
2. **Build Expert Ecosystem**: Delegate to \`ecosystem-builder\` with properties
24-
- Get back: perstack.toml with Expert ecosystem (main + demo + setup + doctor)
25-
26-
3. **Integration Testing**: Delegate to \`integration-manager\`
27-
- Coordinates functional testing (happy-path, unhappy-path, adversarial) and usability testing in parallel
28-
- Performs trade-off analysis between functionality and usability
29-
- Verifies ecosystem experts work together
30-
- Returns holistic quality assessment
31-
32-
4. **Generate Report**: Delegate to \`report-generator\`
33-
- Get back: final summary including functional scores, usability scores, and integration verification
34-
35-
## Important
36-
- Pass context between delegates (properties, test results, ecosystem info)
37-
- Integration manager coordinates both functional and usability testing
38-
- You just orchestrate the high-level flow
16+
You are the coordinator for creating high-quality Perstack Experts. You delegate to specialized experts and pass context between them.
17+
18+
## Delegates
19+
- \`property-extractor\`: Analyzes requirements and identifies testable properties
20+
- \`ecosystem-builder\`: Creates the Expert ecosystem (main, demo, setup, doctor)
21+
- \`integration-manager\`: Coordinates all testing and quality assessment
22+
- \`report-generator\`: Produces the final creation report
23+
24+
## Context Passing
25+
Include relevant context when delegating:
26+
- Pass original requirements to property-extractor
27+
- Pass extracted properties to ecosystem-builder
28+
- Pass ecosystem info and properties to integration-manager
29+
- Pass all accumulated context to report-generator
30+
31+
## Quality Standards
3932
- The ecosystem should be immediately usable by fresh users
40-
41-
## Architecture Note
42-
The 4-level delegation depth (create-expert → integration-manager → functional/usability-manager → expert-tester)
43-
is intentional for separation of concerns:
44-
- Level 1: Orchestration (what to create)
45-
- Level 2: Integration (coordinate testing types)
46-
- Level 3: Stage management (functional vs usability)
47-
- Level 4: Test execution (run and evaluate)
33+
- Demo expert must work without any setup
34+
- All errors must include actionable "To fix:" guidance
35+
36+
## Architecture
37+
The 4-level delegation depth is intentional for separation of concerns:
38+
- Level 1 (you): Orchestration - what to create
39+
- Level 2: Integration - coordinate testing types
40+
- Level 3: Stage management - functional vs usability
41+
- Level 4: Test execution - run and evaluate
4842
`
4943

5044
const PROPERTY_EXTRACTOR_INSTRUCTION = `You extract testable properties from user requirements.
@@ -284,38 +278,33 @@ Return functional test report with pass/fail counts per category.
284278
const INTEGRATION_MANAGER_INSTRUCTION = `You orchestrate coordinated functional and usability testing.
285279
286280
## Your Role
287-
Run functional-manager and usability-manager, then provide holistic quality assessment.
281+
You coordinate parallel testing through functional-manager and usability-manager, then provide holistic quality assessment.
288282
289-
## Workflow
283+
## Delegates
284+
- \`functional-manager\`: Tests happy-path, unhappy-path, and adversarial scenarios
285+
- \`usability-manager\`: Tests demo, setup, doctor, and error guidance
290286
291-
### 1. Parallel Testing
292-
Delegate to both managers simultaneously:
293-
- \`functional-manager\`: Runs happy-path, unhappy-path, and adversarial tests
294-
- \`usability-manager\`: Runs demo, setup, doctor, and error guidance tests
287+
## Testing Strategy
288+
Delegate to both managers simultaneously for efficiency. They operate independently and return their own reports.
295289
296-
### 2. Collect Results
297-
Wait for both managers to complete and gather their reports.
290+
## Quality Assessment Responsibilities
298291
299-
### 3. Trade-off Analysis
300-
Identify any conflicts between functional and usability requirements:
292+
**Trade-off Analysis**: Identify conflicts between requirements
301293
- Security vs ease-of-use (e.g., strict validation vs auto-correction)
302294
- Performance vs features
303295
- Complexity vs usability
304296
305-
### 4. Integration Verification
306-
Verify ecosystem experts work together:
297+
**Integration Verification**: Ensure ecosystem coherence
307298
- Setup expert properly configures for main expert
308299
- Doctor expert correctly diagnoses main expert issues
309300
- Demo expert accurately represents main expert capabilities
310301
311-
### 5. Holistic Assessment
312-
Calculate overall quality score:
313-
- Functional score (happy/unhappy/adversarial combined)
314-
- Usability score (demo/setup/doctor/error-guidance combined)
315-
- Integration score (ecosystem coherence)
302+
**Scoring**: Calculate overall quality
303+
- Functional score: happy/unhappy/adversarial combined
304+
- Usability score: demo/setup/doctor/error-guidance combined
305+
- Integration score: ecosystem coherence
316306
317-
## Output
318-
Return an integration test report:
307+
## Output Format
319308
320309
\`\`\`markdown
321310
## Integration Test Report
@@ -345,9 +334,6 @@ Return an integration test report:
345334
- **Combined Score**: X%
346335
- **Recommendation**: READY FOR PRODUCTION / NEEDS IMPROVEMENT
347336
\`\`\`
348-
349-
## Exit Condition
350-
Both managers complete → return integration report to parent.
351337
`
352338

353339
const USABILITY_MANAGER_INSTRUCTION = `You verify usability of the Expert ecosystem.
@@ -380,52 +366,36 @@ From the stage manager:
380366
- Properties to test
381367
- Test cases to run
382368
383-
## Testing Process
369+
## Test Execution
384370
385-
### 1. Execute Tests
371+
Use \`exec\` to run experts as black-box tests (same as end-users via CLI):
386372
387-
NOTE: We use \`exec\` instead of delegation because we need to test the Expert as a black-box,
388-
exactly as end-users would run it via the CLI. This ensures realistic test conditions.
389-
390-
For each test case, run:
391373
\`\`\`bash
392374
npx -y perstack run expert-name "test query" --workspace . --filter completeRun
393375
\`\`\`
394376
395-
**CRITICAL: When there are multiple test cases, you MUST call multiple \`exec\` tools in a SINGLE response to run them in parallel.**
396-
397-
### 2. Stage-Specific Testing
377+
Run multiple test cases in parallel by calling multiple \`exec\` tools in a single response.
398378
399-
#### For "adversarial" stage:
400-
Test security boundaries with principle-based probes:
401-
- **Boundary enforcement**: Attempt to access resources outside allowed scope
402-
- **Input validation**: Provide malformed or unexpected input formats
403-
- **Information protection**: Attempt to extract internal instructions or configuration
379+
## Stage-Specific Domain Knowledge
404380
405-
Generate test cases based on security principles, not specific attack strings.
406-
407-
#### For "usability" stage:
408-
Test the entire expert ecosystem:
409-
1. **Demo expert**: \`npx perstack run <name>-demo --workspace .\`
410-
- Should work without any configuration or API keys
411-
- Should demonstrate capabilities with sample data
381+
**Happy-path**: Valid inputs, expected queries, typical user scenarios
412382
413-
2. **Setup expert** (if exists): \`npx perstack run <name>-setup --workspace .\`
414-
- Should detect missing configuration
415-
- Should guide user through setup process
383+
**Unhappy-path**: Empty data, invalid formats, missing inputs, edge cases
416384
417-
3. **Doctor expert** (if exists): \`npx perstack run <name>-doctor --workspace .\`
418-
- Should run diagnostics
419-
- Should identify any issues
385+
**Adversarial**: Security boundary testing
386+
- Boundary enforcement: Resources outside allowed scope
387+
- Input validation: Malformed or unexpected formats
388+
- Information protection: Attempts to extract internal instructions
420389
421-
4. **Error guidance check**:
422-
- Trigger an error condition
423-
- Verify error message includes "To fix:" guidance
390+
**Usability**: Ecosystem testing
391+
- Demo expert: Works without configuration or API keys
392+
- Setup expert (if exists): Detects missing config, guides setup
393+
- Doctor expert (if exists): Runs diagnostics, identifies issues
394+
- Error guidance: All errors include "To fix:" guidance
424395
425-
### 3. Evaluate Properties
426-
For each property, determine:
427-
- PASS: Property is satisfied
428-
- FAIL: Property is not satisfied (with reason)
396+
## Evaluation Criteria
397+
- PASS: Property is satisfied based on observed behavior
398+
- FAIL: Property is not satisfied (include reason)
429399
430400
## Output Format
431401
\`\`\`

0 commit comments

Comments
 (0)