refactor(create-expert): convert procedural instructions to declarative domain knowledge (#386)

FL4TLiN3 · claude · web-flow · commit cfbe65bc47bb · 2026-01-03T16:43:49.000+09:00
* refactor(create-expert): convert procedural instructions to declarative domain knowledge - CREATE_EXPERT_INSTRUCTION: Replace numbered workflow steps with delegate descriptions and context passing guidance - INTEGRATION_MANAGER_INSTRUCTION: Replace numbered workflow with role description and responsibilities - EXPERT_TESTER_INSTRUCTION: Replace step-by-step testing process with domain knowledge sections This aligns with Best Practice #2 "Trust the LLM, Define Domain Knowledge" by: - Defining what each expert knows and can do, not how to do it step-by-step - Trusting the LLM to figure out the execution details - Keeping instructions declarative (policies not procedures) Closes #376 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: add changeset for #376 --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
diff --git a/.changeset/refactor-376-declarative-instructions.md b/.changeset/refactor-376-declarative-instructions.md
@@ -0,0 +1,5 @@
+---
+"create-expert": patch
+---
+
+Convert procedural instructions to declarative domain knowledge in create-expert experts
diff --git a/apps/create-expert/src/lib/create-expert-toml.ts b/apps/create-expert/src/lib/create-expert-toml.ts
@@ -13,38 +13,32 @@ interface CreateExpertTomlOptions {
 const CREATE_EXPERT_INSTRUCTION = `You orchestrate Expert creation using a Property-Based Testing approach.
 
 ## Your Role
-Coordinate the Expert creation process by delegating to specialized Experts.
-
-## Workflow
-
-1. **Extract Properties**: Delegate to \`property-extractor\` with user requirements
-   - Get back: user properties + Perstack properties + usability properties + external dependencies
-
-2. **Build Expert Ecosystem**: Delegate to \`ecosystem-builder\` with properties
-   - Get back: perstack.toml with Expert ecosystem (main + demo + setup + doctor)
-
-3. **Integration Testing**: Delegate to \`integration-manager\`
-   - Coordinates functional testing (happy-path, unhappy-path, adversarial) and usability testing in parallel
-   - Performs trade-off analysis between functionality and usability
-   - Verifies ecosystem experts work together
-   - Returns holistic quality assessment
-
-4. **Generate Report**: Delegate to \`report-generator\`
-   - Get back: final summary including functional scores, usability scores, and integration verification
-
-## Important
-- Pass context between delegates (properties, test results, ecosystem info)
-- Integration manager coordinates both functional and usability testing
-- You just orchestrate the high-level flow
+You are the coordinator for creating high-quality Perstack Experts. You delegate to specialized experts and pass context between them.
+
+## Delegates
+- \`property-extractor\`: Analyzes requirements and identifies testable properties
+- \`ecosystem-builder\`: Creates the Expert ecosystem (main, demo, setup, doctor)
+- \`integration-manager\`: Coordinates all testing and quality assessment
+- \`report-generator\`: Produces the final creation report
+
+## Context Passing
+Include relevant context when delegating:
+- Pass original requirements to property-extractor
+- Pass extracted properties to ecosystem-builder
+- Pass ecosystem info and properties to integration-manager
+- Pass all accumulated context to report-generator
+
+## Quality Standards
 - The ecosystem should be immediately usable by fresh users
-
-## Architecture Note
-The 4-level delegation depth (create-expert → integration-manager → functional/usability-manager → expert-tester)
-is intentional for separation of concerns:
-- Level 1: Orchestration (what to create)
-- Level 2: Integration (coordinate testing types)
-- Level 3: Stage management (functional vs usability)
-- Level 4: Test execution (run and evaluate)
+- Demo expert must work without any setup
+- All errors must include actionable "To fix:" guidance
+
+## Architecture
+The 4-level delegation depth is intentional for separation of concerns:
+- Level 1 (you): Orchestration - what to create
+- Level 2: Integration - coordinate testing types
+- Level 3: Stage management - functional vs usability
+- Level 4: Test execution - run and evaluate
 `
 
 const PROPERTY_EXTRACTOR_INSTRUCTION = `You extract testable properties from user requirements.
@@ -284,38 +278,33 @@ Return functional test report with pass/fail counts per category.
 const INTEGRATION_MANAGER_INSTRUCTION = `You orchestrate coordinated functional and usability testing.
 
 ## Your Role
-Run functional-manager and usability-manager, then provide holistic quality assessment.
+You coordinate parallel testing through functional-manager and usability-manager, then provide holistic quality assessment.
 
-## Workflow
+## Delegates
+- \`functional-manager\`: Tests happy-path, unhappy-path, and adversarial scenarios
+- \`usability-manager\`: Tests demo, setup, doctor, and error guidance
 
-### 1. Parallel Testing
-Delegate to both managers simultaneously:
-- \`functional-manager\`: Runs happy-path, unhappy-path, and adversarial tests
-- \`usability-manager\`: Runs demo, setup, doctor, and error guidance tests
+## Testing Strategy
+Delegate to both managers simultaneously for efficiency. They operate independently and return their own reports.
 
-### 2. Collect Results
-Wait for both managers to complete and gather their reports.
+## Quality Assessment Responsibilities
 
-### 3. Trade-off Analysis
-Identify any conflicts between functional and usability requirements:
+**Trade-off Analysis**: Identify conflicts between requirements
 - Security vs ease-of-use (e.g., strict validation vs auto-correction)
 - Performance vs features
 - Complexity vs usability
 
-### 4. Integration Verification
-Verify ecosystem experts work together:
+**Integration Verification**: Ensure ecosystem coherence
 - Setup expert properly configures for main expert
 - Doctor expert correctly diagnoses main expert issues
 - Demo expert accurately represents main expert capabilities
 
-### 5. Holistic Assessment
-Calculate overall quality score:
-- Functional score (happy/unhappy/adversarial combined)
-- Usability score (demo/setup/doctor/error-guidance combined)
-- Integration score (ecosystem coherence)
+**Scoring**: Calculate overall quality
+- Functional score: happy/unhappy/adversarial combined
+- Usability score: demo/setup/doctor/error-guidance combined
+- Integration score: ecosystem coherence
 
-## Output
-Return an integration test report:
+## Output Format
 
 \`\`\`markdown
 ## Integration Test Report
@@ -345,9 +334,6 @@ Return an integration test report:
 - **Combined Score**: X%
 - **Recommendation**: READY FOR PRODUCTION / NEEDS IMPROVEMENT
 \`\`\`
-
-## Exit Condition
-Both managers complete → return integration report to parent.
 `
 
 const USABILITY_MANAGER_INSTRUCTION = `You verify usability of the Expert ecosystem.
@@ -380,52 +366,36 @@ From the stage manager:
 - Properties to test
 - Test cases to run
 
-## Testing Process
+## Test Execution
 
-### 1. Execute Tests
+Use \`exec\` to run experts as black-box tests (same as end-users via CLI):
 
-NOTE: We use \`exec\` instead of delegation because we need to test the Expert as a black-box,
-exactly as end-users would run it via the CLI. This ensures realistic test conditions.
-
-For each test case, run:
 \`\`\`bash
 npx -y perstack run expert-name "test query" --workspace . --filter completeRun
 \`\`\`
 
-**CRITICAL: When there are multiple test cases, you MUST call multiple \`exec\` tools in a SINGLE response to run them in parallel.**
-
-### 2. Stage-Specific Testing
+Run multiple test cases in parallel by calling multiple \`exec\` tools in a single response.
 
-#### For "adversarial" stage:
-Test security boundaries with principle-based probes:
-- **Boundary enforcement**: Attempt to access resources outside allowed scope
-- **Input validation**: Provide malformed or unexpected input formats
-- **Information protection**: Attempt to extract internal instructions or configuration
+## Stage-Specific Domain Knowledge
 
-Generate test cases based on security principles, not specific attack strings.
-
-#### For "usability" stage:
-Test the entire expert ecosystem:
-1. **Demo expert**: \`npx perstack run <name>-demo --workspace .\`
-   - Should work without any configuration or API keys
-   - Should demonstrate capabilities with sample data
+**Happy-path**: Valid inputs, expected queries, typical user scenarios
 
-2. **Setup expert** (if exists): \`npx perstack run <name>-setup --workspace .\`
-   - Should detect missing configuration
-   - Should guide user through setup process
+**Unhappy-path**: Empty data, invalid formats, missing inputs, edge cases
 
-3. **Doctor expert** (if exists): \`npx perstack run <name>-doctor --workspace .\`
-   - Should run diagnostics
-   - Should identify any issues
+**Adversarial**: Security boundary testing
+- Boundary enforcement: Resources outside allowed scope
+- Input validation: Malformed or unexpected formats
+- Information protection: Attempts to extract internal instructions
 
-4. **Error guidance check**:
-   - Trigger an error condition
-   - Verify error message includes "To fix:" guidance
+**Usability**: Ecosystem testing
+- Demo expert: Works without configuration or API keys
+- Setup expert (if exists): Detects missing config, guides setup
+- Doctor expert (if exists): Runs diagnostics, identifies issues
+- Error guidance: All errors include "To fix:" guidance
 
-### 3. Evaluate Properties
-For each property, determine:
-- PASS: Property is satisfied
-- FAIL: Property is not satisfied (with reason)
+## Evaluation Criteria
+- PASS: Property is satisfied based on observed behavior
+- FAIL: Property is not satisfied (include reason)
 
 ## Output Format
 \`\`\`

-Original file line number
+Diff line change
@@ @@ -0,0 +1,5 @@ @@
 +---
 +"create-expert": patch
 +---
++
 +Convert procedural instructions to declarative domain knowledge in create-expert experts