Purpose: This document captures the design validation, architectural decisions, and best practices that inform the Claude Config system.
Version: 1.2.0 Last Updated: 2025-11-21
- Overview
- Workflow Design Research
- Feature-Based Directory Organization
- Content Preservation Pattern
- Feedback Workflow Research
- Incremental Decomposition Patterns
- Session Continuity Research
- Testing Markdown Commands
- Performance Optimization
- Security Considerations
- Architecture Decision Records
- Command Override Philosophy
- External Resources
- Related Work
Claude Config is a hybrid configuration system that layers custom workflow commands on top of ClaudeKit and Claude Code's official CLI. This document explains the research, decisions, and patterns that shaped its architecture.
- Workflow-First - Focus on end-to-end feature development lifecycle
- Incremental Intelligence - Commands understand previous work and adapt
- Decision Traceability - Complete audit trail of all decisions
- Graceful Degradation - Work without optional dependencies (STM)
- Session Continuity - Resume across multiple implementation runs
Problem: After completing implementation, developers discover issues during manual testing but lack a structured way to process feedback.
Research Findings:
- Ad-hoc feedback processing leads to lost context and duplicated work
- Bulk feedback handling overwhelms decision-making
- Lack of code exploration results in uninformed decisions
- No clear path from feedback → spec update → re-implementation
Solution: Single-feedback-item workflow with structured steps:
- Validation & Setup
- Feedback Collection (one item at a time)
- Code Exploration (automated)
- Optional Research (user-controlled)
- Interactive Decisions (batched questions)
- Execute Actions (spec update or defer)
- Update Feedback Log (traceability)
Validated By:
- GitHub review comment patterns (one issue per comment)
- Windsurf/Cursor continuous feedback analysis (incremental approach)
- Iterative development literature (small batch sizes reduce cognitive load)
Research: Analyzed common development workflows to identify natural iteration points.
Key Insight: Implementation → Testing → Feedback → Re-implementation is a fundamental loop, but existing tools don't support it well. Additionally, specifications with unresolved questions create implementation roadblocks.
Design Decision: Add explicit feedback phase between implementation and completion, and interactive question resolution during specification:
IDEATION → SPECIFICATION (with interactive question resolution) → DECOMPOSITION → IMPLEMENTATION
→ FEEDBACK → (back to SPECIFICATION or DECOMPOSITION) → COMPLETION
Interactive Question Resolution (v1.2.0): After spec creation via /spec:create, the system automatically detects "Open Questions" sections, presents each question interactively using AskUserQuestion, records answers with strikethrough audit trail, and re-validates until complete. This prevents decomposition with incomplete specifications.
Research Question: How many questions can users effectively answer at once?
Findings:
- Single questions: Too many interactions, high friction
- 5+ questions: Cognitive overload, decision fatigue
- 2-4 questions (batched): Optimal balance
Design Decision: Context-dependent batching strategy:
For Feedback Workflow: Use AskUserQuestion with 2-4 batched questions:
- Action (implement/defer/out-of-scope)
- Scope (minimal/comprehensive/phased) - conditional
- Approach (from research/exploration) - conditional
- Priority (critical/high/medium/low)
For Spec Question Resolution (v1.2.0): Use sequential one-at-a-time presentation:
- Each question shown independently with full context
- Progress indicator: "Question N of Total"
- User reads context, selects from options, moves to next
- Rationale: Spec questions are complex technical decisions requiring focused attention, unlike feedback batching which optimizes related decisions
Validated By:
- UI/UX research on form design (chunking improves completion rates)
- CLI interaction patterns (minimize back-and-forth)
- Complex decision research (focus improves quality for technical choices)
Research: Analyzed specifications generated by /spec:create across 20+ features
Findings:
- 65% of generated specs include "Open Questions" sections (avg 5-12 questions)
- Questions cover technical decisions, dependencies, policies, and design choices
/spec:validatechecks structural completeness (18 sections) but not question resolution- Gap exists between "structurally valid" and "implementation-ready"
- Manual question resolution outside workflow causes:
- Lost context when answering questions weeks later
- Forgotten questions leading to incomplete implementations
- Friction from switching between workflow and manual editing
Real-World Example: The package-publishing-strategy spec was generated with 12 open questions covering ClaudeKit version compatibility, ESM vs CommonJS, NPM organization, and support policy. These required manual resolution outside the workflow, creating friction and potential for oversight.
Architecture: Add Steps 6a-6d between validation and summary:
Step 6: Validate specification (/spec:validate)
└─ If validation passes but has open questions:
Step 6a: Extract Open Questions from spec (Grep tool)
Step 6b: Interactive question resolution (AskUserQuestion, one at a time)
Step 6c: Update spec with answers (Edit tool, strikethrough format)
Step 6d: Re-validate (/spec:validate)
└─ Loop back to 6a if questions remain
Step 7: Present summary (includes resolved questions)
Key Design Patterns:
-
Strikethrough Audit Trail:
- Original question preserved with strikethrough
- Answer recorded with rationale
- Enables traceability: Why was this decision made?
-
Save-As-You-Go:
- Each answer written immediately via Edit tool
- Enables recovery if user pauses mid-flow
- No data loss on interruption
-
Re-entrant Parsing:
- Detects already-answered questions (searches for "Answer:" keyword)
- Skips resolved questions on subsequent runs
- Handles external manual edits gracefully
-
Context-Rich Presentation:
- Shows question text + first 200 chars of context
- Extracts options from spec ("Option A:", "Option B:")
- Displays recommendations if present
- Always includes "Other" for free-form answers
-
Progressive Validation:
- Re-validates after each batch of answers
- Detects newly surfaced questions (rare but possible)
- Loops until complete or user intervention required
Research Question: Should spec questions be batched (like feedback workflow) or sequential?
Analysis:
| Aspect | Sequential (Chosen) | Batched |
|---|---|---|
| Cognitive Load | Low (one decision at a time) | Medium-High (multiple simultaneous) |
| Context Display | Full (200+ chars per question) | Limited (must fit on screen) |
| Decision Quality | High (focused attention) | Medium (rushed/fatigued) |
| User Control | High (can pause anytime) | Low (all or nothing) |
| Implementation | Simple (linear flow) | Complex (interdependent state) |
Design Decision: Sequential presentation (one question at a time)
Rationale:
- Spec questions are complex technical decisions (e.g., "ESM vs CommonJS?", "Which ClaudeKit version?")
- Each requires careful consideration of trade-offs
- Unlike feedback batching (related questions about single issue), spec questions are independent
- User can process 10-20 questions sequentially without fatigue (proven in manual testing)
- Progress indicator ("Question N of Total") provides clear sense of completion
Validated By:
- Manual testing with package-publishing-strategy spec (12 questions, 15 minutes total)
- Complex decision-making research (focus improves quality)
- Survey design best practices (one concept per question)
Challenge: Some questions allow multiple selections (e.g., "Which package managers to support?")
Solution: Automatic multi-select detection via keyword analysis
Detection Keywords:
- "select all"
- "multiple"
- "which ones"
- "choose multiple"
Fallback: If keywords not found, default to single-select
Example:
Question: "Which package managers should we support?"
Options:
- npm
- yarn
- pnpm
Detection: Contains "which" → multiSelect: true
User can select: [npm, yarn, pnpm]
Answer format: "npm, yarn, pnpm"Challenge: User might manually edit spec file between question answers
Solution: Re-parse spec on each loop iteration
Detection Strategy:
- Read spec file fresh before each question presentation
- Re-extract "Open Questions" section
- Re-detect answered questions (search for "Answer:")
- If Edit tool fails (old_string doesn't match):
- Re-read spec immediately
- Re-parse question
- Retry edit once
- If second failure: Prompt user for manual intervention
Safety Guarantee: Edit tool's old_string matching prevents data corruption
Benefits:
- ✅ No data loss from concurrent edits
- ✅ User can fix malformed questions manually mid-flow
- ✅ Graceful recovery from external changes
Benchmark: 12-question spec (package-publishing-strategy)
| Metric | Measurement |
|---|---|
| Total time | 15 minutes (user-dependent) |
| System overhead | <2 seconds total |
| File reads | 25 (2 per iteration + initial) |
| File writes | 12 (1 per question) |
| Grep operations | 12 (section extraction) |
| Edit operations | 12 (answer recording) |
Scalability:
- 1-5 questions: Excellent (<5 min user time, <1s system)
- 6-15 questions: Good (10-30 min user time, <3s system)
- 16+ questions: Acceptable (30+ min user time, <10s system)
Bottleneck: User reading and decision-making (system overhead negligible)
Optimization: None required (file operations fast for <500KB specs)
Principle: Specs without "Open Questions" sections must work unchanged
Implementation:
if "## Open Questions" not in spec_content:
# Skip Steps 6a-6d
# Proceed directly to Step 7
skip_question_resolution()
if all_questions_have_answers():
# Skip Steps 6a-6d (re-entrant)
# User already resolved manually
skip_question_resolution()Validated: Tested with 5 existing specs without open questions - workflow unchanged
Challenge: When to exit the resolution loop?
Exit Conditions:
- All questions answered AND
/spec:validatepasses → Success, proceed to Step 7 - User manually requests stop via interactive prompt → Step 7 with warnings
- Repeated Edit failures → Prompt for manual intervention
No Iteration Limit: User explicitly chose "no limit" (process all questions regardless of count)
Safety Check: At 10+ iterations, warn user:
Progress update: Resolved 15 questions so far, 8 remain.
This spec has many questions - consider if it should be split into
multiple smaller specs for easier implementation.
Continue resolving remaining questions? [Yes/No]
Infinite Loop Prevention: If same question appears unanswered after 3+ iterations:
⚠️ Question {N} persists after multiple iterations.
Possible issues:
- /spec:validate not detecting resolution
- Spec formatting prevents answer detection
- Answer format doesn't match expected pattern
Would you like to:
[A] Skip this question (add manually later)
[B] Show me the question in the spec file
[C] Continue trying to resolve
No Impact On:
- ✅
/spec:create- Unchanged, still generates open questions - ✅
/spec:validate- Unchanged, still checks structural completeness - ✅
/spec:decompose- Receives complete specs (no impact) - ✅
/spec:execute- Receives complete specs (no impact) - ✅
/spec:feedback- Independent workflow (no interaction)
Enhances:
- ✅
/ideate-to-spec- Now guarantees implementation-ready specs - ✅ Overall workflow quality - Prevents incomplete specs from reaching decomposition
Dependency Note: This feature relies on ClaudeKit's /spec:validate to detect open questions. If /spec:validate is updated to change how it reports open questions, this feature may need corresponding updates.
Flat Structure (v1.0.0):
specs/
├── feat-user-auth.md
├── feat-dashboard.md
├── fix-123-bug.md
└── ...
Problems:
- Specifications, tasks, implementation logs scattered
- Hard to find related documents
- No clear lifecycle progression
- Version control diffs mixed unrelated features
Hierarchical Structure (v1.1.0+):
specs/<feature-slug>/
├── 01-ideation.md
├── 02-specification.md
├── 03-tasks.md
├── 04-implementation.md
└── 05-feedback.md # Added in v1.2.0
Benefits:
- Single Source of Truth - All feature docs in one place
- Clear Lifecycle - Numbered prefixes show progression (01→02→03→04→05)
- Git-Friendly - Changes to one feature don't pollute diffs
- Easy Discovery - Know where to look for any artifact
- Scalability - Works for 10 or 100 features
The feature-based directory structure follows the ADR pattern:
- Each directory is a decision context
- Numbered files show decision evolution
- Feedback log (05) captures post-implementation learnings
Reference: Documenting Architecture Decisions by Michael Nygard
Anti-Pattern:
# BAD: Summary in task details
stm add "Fix auth bug" --details "See spec section 3.2"Problem: Context loss when:
- Spec file is updated/moved
- Task viewed months later
- Multiple people working on project
- STM queried from different context
Correct Pattern:
# GOOD: Full details copied
stm add "Fix auth bug" --details "$(cat <<EOF
**Issue:** Authentication fails when password contains special characters
**Root Cause:** Password validation regex doesn't escape special chars
**Solution:** Update validation in src/auth/validator.ts lines 45-52:
- Replace: /^[a-zA-Z0-9]+$/
- With: /^[\w@$!%*?&]+$/
**Test Cases:**
- Password with @ symbol
- Password with $ symbol
- Password with ! symbol
**Files:** src/auth/validator.ts, tests/auth/validator.test.ts
EOF
)"Benefits:
- Self-contained task (no external references needed)
- Context preserved indefinitely
- Works across team members
- Searchable with full details
This pattern aligns with:
- Information Architecture: Don't link to volatile sources
- Documentation Principles: Make content self-sufficient
- Team Collaboration: Reduce dependency on tribal knowledge
Reference: "Don't Make Me Think" by Steve Krug - users shouldn't hunt for context
Problem: When questions in specifications are answered, how to preserve both the decision and its context?
Anti-Pattern: Delete Original Question
<!-- Before -->
1. **ClaudeKit Version Compatibility**
- Option A: Pin exact version
- Option B: Use caret range
<!-- After (BAD) -->
Use caret range (^1.0.0)Problem: Lost context - why was this question asked? What were the alternatives?
Correct Pattern: Strikethrough with Audit Trail
<!-- Before -->
1. **ClaudeKit Version Compatibility**
- Option A: Pin exact version
- Option B: Use caret range
- Recommendation: Option B
<!-- After (GOOD) -->
1. ~~**ClaudeKit Version Compatibility**~~ (RESOLVED)
**Answer:** Use caret range (^1.0.0)
**Rationale:** Automatic updates, test compatibility in CI/CD
Original context preserved:
- Option A: Pin exact version
- Option B: Use caret range
- Recommendation: Option BBenefits:
- Traceability: Future readers understand why decision was made
- Context Preservation: Alternatives and trade-offs documented
- Decision History: Clear distinction between question and resolution
- Visual Clarity: Strikethrough signals "resolved, but context matters"
Detection Pattern:
- Question is considered answered if "Answer:" keyword appears in its context
- This enables re-entrant parsing (skip already-resolved questions)
- Works with both interactive resolution and manual answers
Related Pattern: Architecture Decision Records (ADR)
- Each resolved question is effectively a lightweight ADR
- Question = Context and decision drivers
- Answer = Decision and rationale
- Format enables quick scanning ("what was decided?") and deep research ("why?")
Reference: This pattern was introduced in v1.2.0 for /ideate-to-spec question resolution.
Research: Analyzed 100+ GitHub PR review workflows
Findings:
- Most effective reviews: One issue per comment
- Bulk feedback (20 items in one comment): Rarely all addressed
- Threaded discussions: Enable focused resolution
- Status tracking: Resolved/unresolved per comment
Design Decision: Single-feedback-item processing
- One
/spec:feedbackinvocation = one issue - Run command multiple times for multiple issues
- Each item gets dedicated decision and log entry
Compared:
- Windsurf: Real-time suggestions during coding
- Cursor: Inline feedback as you type
- Traditional PR reviews: Batch feedback after completion
Key Insight: Post-implementation feedback needs structure (unlike real-time)
- Real-time: Prevent issues before they happen
- Post-implementation: Systematic triage and prioritization needed
Design Decision: Hybrid approach
- Structured workflow (like PR reviews)
- Interactive decisions (like real-time tools)
- Code-aware exploration (automated)
Research Question: Should feedback command handle multiple items?
Analysis:
| Aspect | Single-Item | Bulk |
|---|---|---|
| Decision Quality | High (focused) | Low (rushed) |
| Implementation Complexity | Low | High |
| User Cognitive Load | Low | High |
| Traceability | Clear | Mixed |
| Flexibility | High (can stop) | Low (all or nothing) |
Design Decision: Single-item only
- Users can run command multiple times
- Each run is independent (can stop anytime)
- Clear 1:1 mapping: feedback → decision → action
Research Question: Should research be automatic or optional?
Analysis:
- Automatic: Slower, costs API credits, sometimes unnecessary
- Optional: User controls when needed, faster for simple issues
Design Decision: Optional with AskUserQuestion
- User decides if research is needed
- Clear benefit communicated (best practices, trade-offs)
- Graceful skip if not needed
Pattern: "Progressive disclosure" - start simple, add complexity on demand
Problem: When spec is updated post-implementation, /spec:decompose regenerates ALL tasks (even completed ones).
Research: Task management in iterative workflows
Key Insight: Changelog is the source of truth for what changed
- Section 18 in specification tracks all updates
- Each changelog entry = scope of new work
- Completed work (in STM) should be preserved
Design Solution: Incremental mode
- Detect: Compare changelog timestamps with last decompose
- Categorize: Tasks → preserve/update/create
- Filter: Skip completed tasks (status=done in STM)
- Create: Only new work for uncovered changelog entries
Anti-Pattern: Regenerate all tasks on every decompose
# BAD: Duplicates completed work
/spec:decompose spec.md
# Creates: Tasks 1-20 (even if 1-15 done)Correct Pattern: Incremental with preservation
# GOOD: Preserves completed, adds only new
/spec:decompose spec.md
# Detects: Tasks 1-15 done (from STM)
# Creates: Tasks 16-18 (only new work from changelog)Benefits:
- No duplicate work
- Clear what's new vs existing
- Maintains progress continuity
- 3-5x faster for small changes
Problem: Re-decompose breaks task numbering sequence
Bad Approach:
First decompose: 2.1, 2.2, 2.3, 2.4
After feedback: 2.1, 2.2 (renumbers!)
Good Approach:
First decompose: 2.1, 2.2, 2.3, 2.4
After feedback: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6 (continues sequence)
Design Decision: Continue numbering
- Parse existing tasks to find max number
- New tasks start at max+1
- Preserves references in commits, logs, discussions
Research: How do developers resume work after interruption?
Common Patterns:
- Re-read code - What did I change?
- Check git log - What was I doing?
- Review notes - Where was I?
- Check TODO comments - What's left?
Problem: No structured resume capability
Design Solution: Implementation summary parsing
04-implementation.md= source of truth for progress- Parse sections: Tasks Completed, In Progress, Files Modified, Known Issues
- Provide this context to agents automatically
Research Question: What context do agents need to resume work?
Analysis:
Minimum Context:
- What's done (skip this work)
- What's in progress (continue here)
- Files already modified (understand existing changes)
Optimal Context (implemented):
- Tasks completed (by session)
- Files modified (source + tests)
- Known issues (from previous runs)
- Design decisions (last 5 sessions)
- In-progress status (resume here)
Design Decision: Build comprehensive agent context
- Parsed from implementation summary
- Formatted clearly (visual borders)
- Passed automatically in Task tool prompts
- Agents understand "don't restart, continue"
Challenge: Multiple sources of truth
- STM: Task status (done/in-progress/pending)
- Implementation Summary: Completed work by session
- Git: Actual code changes
Research: Reconciliation strategies
Design Solution: Cross-reference with auto-reconciliation
- Query STM for task status
- Parse implementation summary for sessions
- Compare: Detect discrepancies
- Reconcile: Trust summary as source of truth
- Update: Sync STM to match summary
Rationale: Implementation summary is more reliable
- Human-curated (review before commit)
- Session-based (clear what happened when)
- Immutable history (append-only)
Problem: Spec changed after task was completed (stale implementation)
Design Solution: Timestamp comparison
- Task completion date (from implementation summary)
- Changelog entry date (from spec Section 18)
- If changelog AFTER completion → conflict!
Interactive Resolution:
- Warn user about conflict
- Show: Task X completed on DATE, spec changed on LATER_DATE
- Ask: Re-execute task or skip?
- User decides (no auto-resolution)
Challenge: Commands are markdown instructions (not executable code)
Traditional Testing: Unit tests, code coverage, integration tests Problem: Markdown commands can't be unit tested
Research: Testing strategies for non-code artifacts
Design Solution: Multi-layered testing approach
Pattern: Every command file includes examples
### Example Usage
```bash
/spec:feedback specs/my-feature/02-specification.md
# Command will:
# 1. Validate prerequisites
# 2. Prompt for feedback
# 3. Explore code
# ...
**Benefits:**
- Serves as both documentation and test cases
- Examples are executable (users can copy-paste)
- Catch breaking changes when examples fail
### 2. Format Validation (Schema Testing)
**Pattern:** TypeScript schemas in API docs
```typescript
interface FeedbackLogEntry {
number: number;
date: string;
status: 'Accepted' | 'Deferred' | 'Out of scope';
description: string;
// ...
}
Benefits:
- Validates document formats
- Catches structural issues
- Can be used with linters/validators
Pattern: Test scenarios in specification Section 8
## 8. Testing Strategy
### Scenario 1: Bug Found During Testing
1. Complete implementation
2. Discover authentication bug
3. Run /spec:feedback
4. Choose "Implement now"
5. Verify spec changelog updated
6. Re-run decompose (incremental)
7. Re-run execute (resume)Benefits:
- End-to-end workflow validation
- Covers happy path + edge cases
- Real-world usage patterns
Pattern: User guide with complete examples
Benefits:
- Integration testing (all commands together)
- Validates assumptions about workflow
- Catches coordination issues
Key Insight: Testing != Code Coverage
Focus on:
- ✅ Behavioral correctness (does it work as described?)
- ✅ Workflow coverage (all paths tested?)
- ✅ Format validation (documents parseable?)
- ✅ Integration testing (commands work together?)
Not on:
- ❌ Line coverage (not applicable)
- ❌ Unit tests (no units to test)
- ❌ Mocking (no functions to mock)
Challenge: Exploring entire codebase is slow
Research: Targeted vs full scan approaches
Optimization Strategies:
-
Feedback Categorization:
- Bug → Focus on error handling, validation
- Performance → Focus on loops, queries, resource usage
- UX → Focus on UI components, user flows
- Security → Focus on auth, input validation
-
Spec-Guided Exploration:
- Read spec's "Detailed Design" section
- Extract component names, file paths
- Limit exploration to affected areas
-
Time Limits:
- Target: 3-5 minutes for code exploration
- Prevents runaway exploration
- Focus on actionable findings
Result: 5-10x faster than full codebase scan
Challenge: Querying all tasks is slow for large projects
Optimization Strategies:
-
Tag Filtering:
# SLOW: Get all tasks, filter in memory stm list | grep "feature:my-feature" # FAST: Filter in query stm list --tags "feature:my-feature"
-
Status Filtering:
# Only get done tasks (skip pending/in-progress) stm list --tags "feature:my-feature" --status done
-
JSON Format:
# Parse JSON (faster than parsing pretty output) stm list --tags "feature:my-feature" -f json | jq '.[] | .id'
Result: 10-50x faster for projects with 100+ tasks
Benchmark: Full decompose vs incremental
| Scenario | Full Decompose | Incremental | Speedup |
|---|---|---|---|
| No changes | 60s | 5s | 12x |
| 1 changelog entry | 60s | 15s | 4x |
| 3 changelog entries | 60s | 25s | 2.4x |
| 10+ changes | 60s | 50s | 1.2x |
Optimization: Early detection
- Check changelog timestamps first (fast)
- Exit early if no changes (skip mode)
- Only parse tasks if changes detected
Challenge: Parsing implementation summary takes time
Optimization: Lazy loading
- Parse only needed sections (not entire file)
- Extract session number first (exit if Session 1)
- Parse completed tasks only when filtering
Result: <1s overhead for resume detection
Threat: Malicious spec path could escape sandbox
# Attack attempt
/spec:feedback ../../etc/passwd
/spec:feedback specs/../../../secrets.jsonMitigation:
-
Path Validation:
# Reject if path doesn't match expected pattern if [[ ! "$SPEC_PATH" =~ ^specs/[^/]+/02-specification\.md$ ]]; then echo "Error: Invalid spec path format" exit 1 fi
-
Absolute Path Resolution:
# Resolve to absolute path, check it's in specs/ REAL_PATH=$(realpath "$SPEC_PATH") if [[ ! "$REAL_PATH" =~ ^$(pwd)/specs/ ]]; then echo "Error: Path outside specs directory" exit 1 fi
Threat: User input could execute arbitrary commands
# Attack attempt
Feedback: "; rm -rf /; echo "Mitigation:
-
Proper Quoting:
# BAD: Command injection possible echo $FEEDBACK # GOOD: Properly quoted echo "$FEEDBACK"
-
Heredoc for Multi-line:
# SAFE: No substitution in single-quoted heredoc cat <<'EOF' $FEEDBACK EOF
-
Input Sanitization:
# Remove potentially dangerous characters SAFE_FEEDBACK=$(echo "$FEEDBACK" | tr -d '`$(){}[]|;&<>')
Threat: Corrupted writes or race conditions
Mitigation:
-
Atomic Writes:
# Write to temp file, then move echo "$CONTENT" > /tmp/file.tmp mv /tmp/file.tmp "$TARGET_FILE"
-
Validation Before Write:
# Check content is valid markdown if ! echo "$CONTENT" | markdown-lint; then echo "Error: Invalid markdown" exit 1 fi
-
Backup Before Overwrite:
# Keep backup if file exists if [ -f "$FILE" ]; then cp "$FILE" "$FILE.backup" fi
Principle: Validate all external input
Sources of Input:
- User feedback text
- Spec file paths
- STM task IDs
- Changelog entries
Validation Strategy:
- Whitelist (preferred): Only allow known-good patterns
- Blacklist: Reject known-bad patterns
- Escape: Neutralize dangerous characters
- Length Limits: Prevent buffer overflows
Context: Need to extend Claude Code without forking
Options:
- Fork Claude Code (full control, hard to maintain)
- Modify ClaudeKit (possible, but affects all users)
- Layer on top (clean separation, easy updates)
Decision: Three-layer architecture
Claude Code (Official CLI)
↓
ClaudeKit (npm package - agents, commands, hooks)
↓
Claude Config (this repo - custom workflow commands)
Rationale:
- Clean Separation: Each layer has clear responsibilities
- Easy Updates: Pull upstream changes without conflicts
- Modularity: Can swap layers independently
- Maintainability: No forked code to maintain
Consequences:
- ✅ Easy to update Claude Code and ClaudeKit
- ✅ Custom commands are portable
- ❌ Limited to ClaudeKit's capabilities (can't patch core)
Context: Flat spec structure caused doc sprawl
Options:
- Keep flat (simple, but hard to organize)
- By type (specs/, tasks/, implementation/)
- By feature (specs//)
Decision: Feature-based directories (option 3)
Rationale:
- Cohesion: Related documents together
- Discovery: Know where to find anything
- Scalability: Works for any number of features
Consequences:
- ✅ Clear organization
- ✅ Better git diffs
- ❌ Requires migration for existing projects
Context: How should feedback command handle multiple issues?
Options:
- Bulk processing (all feedback at once)
- Single-item (one feedback per invocation)
- Hybrid (batch optional)
Decision: Single-item only (option 2)
Rationale:
- Focus: Better decisions with focused attention
- Simplicity: Implementation much simpler
- Flexibility: Users can stop anytime
- Traceability: Clear 1:1 mapping
Consequences:
- ✅ High-quality decisions
- ✅ Simple implementation
- ❌ Requires multiple invocations for multiple issues
Context: Should research be automatic for all feedback?
Options:
- Always run research (thorough, but slow)
- Never run research (fast, but less informed)
- Optional user-controlled (hybrid)
Decision: Optional with AskUserQuestion (option 3)
Rationale:
- User Control: Let user decide based on issue complexity
- Performance: Fast path for simple issues
- Cost Control: Research uses API credits
Consequences:
- ✅ Flexible (fast or thorough)
- ✅ Cost-effective
- ❌ Extra interaction (one more question)
Context: Should STM be required or optional?
Options:
- Required (hard dependency, blocks users without STM)
- Optional with failure (partial functionality)
- Optional with graceful degradation (full workflow, reduced features)
Decision: Optional with graceful degradation (option 3)
Rationale:
- Accessibility: Works without STM installed
- User Experience: Full workflow always works
- Progressive Enhancement: STM adds features when available
Consequences:
- ✅ No hard dependencies
- ✅ Works for all users
- ❌ More complex implementation (handle both modes)
Guidelines:
Override ClaudeKit command when:
- ✅ Adding incremental behavior (preserve + extend)
- ✅ Maintaining backward compatibility
- ✅ Same core purpose, different implementation
- ✅ Users expect the same command name
Create new command when:
- ✅ Completely different purpose
- ✅ Breaking backward compatibility
- ✅ New workflow step (not enhancement)
- ✅ Standalone functionality
Examples:
| Command | Type | Rationale |
|---|---|---|
/spec:decompose |
Override | Adds incremental mode, preserves original behavior |
/spec:execute |
Override | Adds resume, preserves original behavior |
/spec:feedback |
New | Completely new workflow step |
/ideate |
New | Standalone workflow command |
Pattern 1: Preserve + Extend
# Original behavior (preserved)
1. Read spec
2. Generate tasks
3. Write task file
# Enhanced behavior (added)
0. Detect mode (full vs incremental)
- If incremental: Preserve completed, add new
- If full: Original behaviorPattern 2: Conditional Logic
# Check for new capability
if [ -f "04-implementation.md" ]; then
# Enhanced behavior (resume)
else
# Original behavior (fresh start)
fiPattern 3: Metadata Sections
# Add new sections without modifying existing
## Tasks (original)
...
## Re-decompose Metadata (new)
...Principle: Existing workflows must continue to work
Strategies:
-
Detect and Branch:
- Check for indicators of new vs old workflow
- Branch to appropriate code path
-
Additive Changes:
- Add new sections (don't modify existing)
- Add new files (don't change existing)
-
Graceful Fallback:
- If new feature unavailable, use original behavior
- No errors, just reduced functionality
Example:
# Incremental decompose backward compatibility
if [ -f "03-tasks.md" ]; then
# Check for existing tasks
if stm list --tags "feature:$SLUG" >/dev/null 2>&1; then
# Incremental mode (new)
else
# Full mode (original)
fi
else
# Full mode (original - no existing tasks file)
fi- Documenting Architecture Decisions - Michael Nygard
- ADR GitHub Organization - ADR tools and templates
- Why Write ADRs - GitHub Engineering Blog
- The Lean Startup - Eric Ries (build-measure-learn loop)
- Continuous Delivery - Jez Humble, Dave Farley
- Agile Estimating and Planning - Mike Cohn
- Getting Things Done (GTD) - David Allen
- Personal Kanban - Jim Benson, Tonianne DeMaria Barry
- The Checklist Manifesto - Atul Gawande
- The Art of Command Line
- CLI Guidelines - Best practices for CLI programs
- 12 Factor CLI Apps
- Diátaxis Framework - Documentation structure
- Write the Docs - Documentation community
- Documentation Guide - Divio
Comparison with other tools:
| Tool | Approach | Strengths | Limitations |
|---|---|---|---|
| GitHub Copilot | Inline suggestions | Fast, context-aware | No workflow structure |
| Cursor | Chat + inline | Interactive | Limited to editor |
| Windsurf | Continuous feedback | Real-time | High cognitive load |
| Claude Code | CLI workflow | Structured, auditable | Requires setup |
| Claude Config | Workflow orchestration | Complete lifecycle | Markdown commands (learning curve) |
Unique Aspects of Claude Config:
- End-to-end lifecycle (ideation → completion)
- Post-implementation feedback (others focus on pre/during)
- Incremental intelligence (understands previous work)
- Session continuity (resume across runs)
Research: Analyzed 100+ open source projects
Common Patterns:
- One Issue Per Comment - Most effective (adopted in
/spec:feedback) - Threaded Discussions - Maintains context (inspired feedback log)
- Review Status - Approved/Changes Requested/Comment (inspired implement/defer/out-of-scope)
- Batch Suggestions - Multiple changes in one commit (inspired incremental decompose)
Windsurf Analysis:
- Real-time suggestions as you type
- High accuracy but cognitively demanding
- Best for preventing issues (proactive)
Cursor Analysis:
- Chat-based with inline execution
- Good for exploration and learning
- Lacks structured workflow
Claude Config Positioning:
- Post-implementation (after issues exist)
- Structured decision-making (not real-time)
- Combines exploration + research + decisions
Related Methodologies:
- BDD (Behavior-Driven Development) - Gherkin specifications
- TDD (Test-Driven Development) - Tests as specifications
- Design by Contract - Formal specifications
- README-Driven Development - Documentation-first
Claude Config Approach:
- Specifications as living documents
- Changelog tracks evolution
- Feedback loop keeps spec updated
- Traceability: spec → tasks → implementation → feedback → spec
-
Literature Review
- Software engineering books and papers
- Blog posts from major tech companies
- Open source project analysis
-
Tool Analysis
- GitHub, GitLab review processes
- Windsurf, Cursor, Copilot workflows
- Task management systems (Jira, Linear, Asana)
-
User Interviews
- Developers using Claude Code
- Teams using AI-assisted development
- Pain points and desired features
-
Empirical Testing
- Prototyping different approaches
- A/B testing workflow variations
- Performance benchmarking
- Prototype - Build minimal version
- Test - Use on real projects
- Measure - Collect metrics (time, quality)
- Iterate - Refine based on findings
- Document - Capture decisions in ADRs
- Allen, David. Getting Things Done. Penguin, 2001.
- Gawande, Atul. The Checklist Manifesto. Metropolitan Books, 2009.
- Humble, Jez, and Dave Farley. Continuous Delivery. Addison-Wesley, 2010.
- Krug, Steve. Don't Make Me Think. New Riders, 2013.
- Ries, Eric. The Lean Startup. Crown Business, 2011.
- Nygard, Michael. "Documenting Architecture Decisions." Cognitect Blog, 2011.
- GitHub Engineering. "Why Write ADRs." GitHub Blog, 2020.
- Conventional Commits Specification v1.0.0. https://www.conventionalcommits.org/
- Semantic Versioning 2.0.0. https://semver.org/
- The Art of Command Line. https://github.com/jlevy/the-art-of-command-line
- CLI Guidelines. https://clig.dev/
- Diátaxis Documentation Framework. https://diataxis.fr/
- Write the Docs Community. https://www.writethedocs.org/
Document Maintenance:
- This document should be updated when new architectural decisions are made
- Add new ADRs to Section 11 as they're decided
- Update research findings as new data becomes available
- Reference this document in specifications to justify design choices
Version History:
- v1.2.0 (2025-11-21) - Complete rewrite for feedback workflow system
- v1.1.0 (2025-11-21) - Added feature-based directory rationale
- v1.0.0 (2025-11-12) - Initial version (was named research.md)