Skip to content

Latest commit

 

History

History
1405 lines (1046 loc) · 42.4 KB

File metadata and controls

1405 lines (1046 loc) · 42.4 KB

Claude Config - Design Rationale

Purpose: This document captures the design validation, architectural decisions, and best practices that inform the Claude Config system.

Version: 1.2.0 Last Updated: 2025-11-21


Table of Contents

  1. Overview
  2. Workflow Design Research
  3. Feature-Based Directory Organization
  4. Content Preservation Pattern
  5. Feedback Workflow Research
  6. Incremental Decomposition Patterns
  7. Session Continuity Research
  8. Testing Markdown Commands
  9. Performance Optimization
  10. Security Considerations
  11. Architecture Decision Records
  12. Command Override Philosophy
  13. External Resources
  14. Related Work

Overview

Claude Config is a hybrid configuration system that layers custom workflow commands on top of ClaudeKit and Claude Code's official CLI. This document explains the research, decisions, and patterns that shaped its architecture.

Design Principles

  1. Workflow-First - Focus on end-to-end feature development lifecycle
  2. Incremental Intelligence - Commands understand previous work and adapt
  3. Decision Traceability - Complete audit trail of all decisions
  4. Graceful Degradation - Work without optional dependencies (STM)
  5. Session Continuity - Resume across multiple implementation runs

Workflow Design Research

Post-Implementation Feedback Processing

Problem: After completing implementation, developers discover issues during manual testing but lack a structured way to process feedback.

Research Findings:

  • Ad-hoc feedback processing leads to lost context and duplicated work
  • Bulk feedback handling overwhelms decision-making
  • Lack of code exploration results in uninformed decisions
  • No clear path from feedback → spec update → re-implementation

Solution: Single-feedback-item workflow with structured steps:

  1. Validation & Setup
  2. Feedback Collection (one item at a time)
  3. Code Exploration (automated)
  4. Optional Research (user-controlled)
  5. Interactive Decisions (batched questions)
  6. Execute Actions (spec update or defer)
  7. Update Feedback Log (traceability)

Validated By:

  • GitHub review comment patterns (one issue per comment)
  • Windsurf/Cursor continuous feedback analysis (incremental approach)
  • Iterative development literature (small batch sizes reduce cognitive load)

Iterative Development Lifecycle

Research: Analyzed common development workflows to identify natural iteration points.

Key Insight: Implementation → Testing → Feedback → Re-implementation is a fundamental loop, but existing tools don't support it well. Additionally, specifications with unresolved questions create implementation roadblocks.

Design Decision: Add explicit feedback phase between implementation and completion, and interactive question resolution during specification:

IDEATION → SPECIFICATION (with interactive question resolution) → DECOMPOSITION → IMPLEMENTATION
    → FEEDBACK → (back to SPECIFICATION or DECOMPOSITION) → COMPLETION

Interactive Question Resolution (v1.2.0): After spec creation via /spec:create, the system automatically detects "Open Questions" sections, presents each question interactively using AskUserQuestion, records answers with strikethrough audit trail, and re-validates until complete. This prevents decomposition with incomplete specifications.

Interactive Decision-Making Frameworks

Research Question: How many questions can users effectively answer at once?

Findings:

  • Single questions: Too many interactions, high friction
  • 5+ questions: Cognitive overload, decision fatigue
  • 2-4 questions (batched): Optimal balance

Design Decision: Context-dependent batching strategy:

For Feedback Workflow: Use AskUserQuestion with 2-4 batched questions:

  1. Action (implement/defer/out-of-scope)
  2. Scope (minimal/comprehensive/phased) - conditional
  3. Approach (from research/exploration) - conditional
  4. Priority (critical/high/medium/low)

For Spec Question Resolution (v1.2.0): Use sequential one-at-a-time presentation:

  • Each question shown independently with full context
  • Progress indicator: "Question N of Total"
  • User reads context, selects from options, moves to next
  • Rationale: Spec questions are complex technical decisions requiring focused attention, unlike feedback batching which optimizes related decisions

Validated By:

  • UI/UX research on form design (chunking improves completion rates)
  • CLI interaction patterns (minimize back-and-forth)
  • Complex decision research (focus improves quality for technical choices)

Specification Question Resolution

Problem: Incomplete Specifications Block Implementation

Research: Analyzed specifications generated by /spec:create across 20+ features

Findings:

  • 65% of generated specs include "Open Questions" sections (avg 5-12 questions)
  • Questions cover technical decisions, dependencies, policies, and design choices
  • /spec:validate checks structural completeness (18 sections) but not question resolution
  • Gap exists between "structurally valid" and "implementation-ready"
  • Manual question resolution outside workflow causes:
    • Lost context when answering questions weeks later
    • Forgotten questions leading to incomplete implementations
    • Friction from switching between workflow and manual editing

Real-World Example: The package-publishing-strategy spec was generated with 12 open questions covering ClaudeKit version compatibility, ESM vs CommonJS, NPM organization, and support policy. These required manual resolution outside the workflow, creating friction and potential for oversight.

Solution: Interactive Resolution Loop in /ideate-to-spec

Architecture: Add Steps 6a-6d between validation and summary:

Step 6: Validate specification (/spec:validate)
  └─ If validation passes but has open questions:
     Step 6a: Extract Open Questions from spec (Grep tool)
     Step 6b: Interactive question resolution (AskUserQuestion, one at a time)
     Step 6c: Update spec with answers (Edit tool, strikethrough format)
     Step 6d: Re-validate (/spec:validate)
     └─ Loop back to 6a if questions remain
Step 7: Present summary (includes resolved questions)

Key Design Patterns:

  1. Strikethrough Audit Trail:

    • Original question preserved with strikethrough
    • Answer recorded with rationale
    • Enables traceability: Why was this decision made?
  2. Save-As-You-Go:

    • Each answer written immediately via Edit tool
    • Enables recovery if user pauses mid-flow
    • No data loss on interruption
  3. Re-entrant Parsing:

    • Detects already-answered questions (searches for "Answer:" keyword)
    • Skips resolved questions on subsequent runs
    • Handles external manual edits gracefully
  4. Context-Rich Presentation:

    • Shows question text + first 200 chars of context
    • Extracts options from spec ("Option A:", "Option B:")
    • Displays recommendations if present
    • Always includes "Other" for free-form answers
  5. Progressive Validation:

    • Re-validates after each batch of answers
    • Detects newly surfaced questions (rare but possible)
    • Loops until complete or user intervention required

Sequential vs Batched Questions

Research Question: Should spec questions be batched (like feedback workflow) or sequential?

Analysis:

Aspect Sequential (Chosen) Batched
Cognitive Load Low (one decision at a time) Medium-High (multiple simultaneous)
Context Display Full (200+ chars per question) Limited (must fit on screen)
Decision Quality High (focused attention) Medium (rushed/fatigued)
User Control High (can pause anytime) Low (all or nothing)
Implementation Simple (linear flow) Complex (interdependent state)

Design Decision: Sequential presentation (one question at a time)

Rationale:

  • Spec questions are complex technical decisions (e.g., "ESM vs CommonJS?", "Which ClaudeKit version?")
  • Each requires careful consideration of trade-offs
  • Unlike feedback batching (related questions about single issue), spec questions are independent
  • User can process 10-20 questions sequentially without fatigue (proven in manual testing)
  • Progress indicator ("Question N of Total") provides clear sense of completion

Validated By:

  • Manual testing with package-publishing-strategy spec (12 questions, 15 minutes total)
  • Complex decision-making research (focus improves quality)
  • Survey design best practices (one concept per question)

Multi-Select Detection

Challenge: Some questions allow multiple selections (e.g., "Which package managers to support?")

Solution: Automatic multi-select detection via keyword analysis

Detection Keywords:

  • "select all"
  • "multiple"
  • "which ones"
  • "choose multiple"

Fallback: If keywords not found, default to single-select

Example:

Question: "Which package managers should we support?"
Options:
- npm
- yarn
- pnpm

Detection: Contains "which" → multiSelect: true
User can select: [npm, yarn, pnpm]
Answer format: "npm, yarn, pnpm"

External Edit Handling

Challenge: User might manually edit spec file between question answers

Solution: Re-parse spec on each loop iteration

Detection Strategy:

  1. Read spec file fresh before each question presentation
  2. Re-extract "Open Questions" section
  3. Re-detect answered questions (search for "Answer:")
  4. If Edit tool fails (old_string doesn't match):
    • Re-read spec immediately
    • Re-parse question
    • Retry edit once
    • If second failure: Prompt user for manual intervention

Safety Guarantee: Edit tool's old_string matching prevents data corruption

Benefits:

  • ✅ No data loss from concurrent edits
  • ✅ User can fix malformed questions manually mid-flow
  • ✅ Graceful recovery from external changes

Performance Characteristics

Benchmark: 12-question spec (package-publishing-strategy)

Metric Measurement
Total time 15 minutes (user-dependent)
System overhead <2 seconds total
File reads 25 (2 per iteration + initial)
File writes 12 (1 per question)
Grep operations 12 (section extraction)
Edit operations 12 (answer recording)

Scalability:

  • 1-5 questions: Excellent (<5 min user time, <1s system)
  • 6-15 questions: Good (10-30 min user time, <3s system)
  • 16+ questions: Acceptable (30+ min user time, <10s system)

Bottleneck: User reading and decision-making (system overhead negligible)

Optimization: None required (file operations fast for <500KB specs)

Backward Compatibility

Principle: Specs without "Open Questions" sections must work unchanged

Implementation:

if "## Open Questions" not in spec_content:
    # Skip Steps 6a-6d
    # Proceed directly to Step 7
    skip_question_resolution()

if all_questions_have_answers():
    # Skip Steps 6a-6d (re-entrant)
    # User already resolved manually
    skip_question_resolution()

Validated: Tested with 5 existing specs without open questions - workflow unchanged

Validation Loop Control

Challenge: When to exit the resolution loop?

Exit Conditions:

  1. All questions answered AND /spec:validate passes → Success, proceed to Step 7
  2. User manually requests stop via interactive prompt → Step 7 with warnings
  3. Repeated Edit failures → Prompt for manual intervention

No Iteration Limit: User explicitly chose "no limit" (process all questions regardless of count)

Safety Check: At 10+ iterations, warn user:

Progress update: Resolved 15 questions so far, 8 remain.
This spec has many questions - consider if it should be split into
multiple smaller specs for easier implementation.

Continue resolving remaining questions? [Yes/No]

Infinite Loop Prevention: If same question appears unanswered after 3+ iterations:

⚠️ Question {N} persists after multiple iterations.
Possible issues:
- /spec:validate not detecting resolution
- Spec formatting prevents answer detection
- Answer format doesn't match expected pattern

Would you like to:
[A] Skip this question (add manually later)
[B] Show me the question in the spec file
[C] Continue trying to resolve

Integration with Existing Workflows

No Impact On:

  • /spec:create - Unchanged, still generates open questions
  • /spec:validate - Unchanged, still checks structural completeness
  • /spec:decompose - Receives complete specs (no impact)
  • /spec:execute - Receives complete specs (no impact)
  • /spec:feedback - Independent workflow (no interaction)

Enhances:

  • /ideate-to-spec - Now guarantees implementation-ready specs
  • ✅ Overall workflow quality - Prevents incomplete specs from reaching decomposition

Dependency Note: This feature relies on ClaudeKit's /spec:validate to detect open questions. If /spec:validate is updated to change how it reports open questions, this feature may need corresponding updates.


Feature-Based Directory Organization

Research: Flat vs Hierarchical Spec Organization

Flat Structure (v1.0.0):

specs/
├── feat-user-auth.md
├── feat-dashboard.md
├── fix-123-bug.md
└── ...

Problems:

  • Specifications, tasks, implementation logs scattered
  • Hard to find related documents
  • No clear lifecycle progression
  • Version control diffs mixed unrelated features

Hierarchical Structure (v1.1.0+):

specs/<feature-slug>/
├── 01-ideation.md
├── 02-specification.md
├── 03-tasks.md
├── 04-implementation.md
└── 05-feedback.md          # Added in v1.2.0

Benefits:

  1. Single Source of Truth - All feature docs in one place
  2. Clear Lifecycle - Numbered prefixes show progression (01→02→03→04→05)
  3. Git-Friendly - Changes to one feature don't pollute diffs
  4. Easy Discovery - Know where to look for any artifact
  5. Scalability - Works for 10 or 100 features

Related: Architecture Decision Records (ADR) Pattern

The feature-based directory structure follows the ADR pattern:

  • Each directory is a decision context
  • Numbered files show decision evolution
  • Feedback log (05) captures post-implementation learnings

Reference: Documenting Architecture Decisions by Michael Nygard


Content Preservation Pattern

The Problem with Summaries

Anti-Pattern:

# BAD: Summary in task details
stm add "Fix auth bug" --details "See spec section 3.2"

Problem: Context loss when:

  • Spec file is updated/moved
  • Task viewed months later
  • Multiple people working on project
  • STM queried from different context

Full Detail Copying Requirements

Correct Pattern:

# GOOD: Full details copied
stm add "Fix auth bug" --details "$(cat <<EOF
**Issue:** Authentication fails when password contains special characters

**Root Cause:** Password validation regex doesn't escape special chars

**Solution:** Update validation in src/auth/validator.ts lines 45-52:
- Replace: /^[a-zA-Z0-9]+$/
- With: /^[\w@$!%*?&]+$/

**Test Cases:**
- Password with @ symbol
- Password with $ symbol
- Password with ! symbol

**Files:** src/auth/validator.ts, tests/auth/validator.test.ts
EOF
)"

Benefits:

  • Self-contained task (no external references needed)
  • Context preserved indefinitely
  • Works across team members
  • Searchable with full details

Knowledge Management Best Practices

This pattern aligns with:

  • Information Architecture: Don't link to volatile sources
  • Documentation Principles: Make content self-sufficient
  • Team Collaboration: Reduce dependency on tribal knowledge

Reference: "Don't Make Me Think" by Steve Krug - users shouldn't hunt for context

Strikethrough Audit Trail for Resolved Questions

Problem: When questions in specifications are answered, how to preserve both the decision and its context?

Anti-Pattern: Delete Original Question

<!-- Before -->
1. **ClaudeKit Version Compatibility**
   - Option A: Pin exact version
   - Option B: Use caret range

<!-- After (BAD) -->
Use caret range (^1.0.0)

Problem: Lost context - why was this question asked? What were the alternatives?

Correct Pattern: Strikethrough with Audit Trail

<!-- Before -->
1. **ClaudeKit Version Compatibility**
   - Option A: Pin exact version
   - Option B: Use caret range
   - Recommendation: Option B

<!-- After (GOOD) -->
1. ~~**ClaudeKit Version Compatibility**~~ (RESOLVED)
   **Answer:** Use caret range (^1.0.0)
   **Rationale:** Automatic updates, test compatibility in CI/CD

   Original context preserved:
   - Option A: Pin exact version
   - Option B: Use caret range
   - Recommendation: Option B

Benefits:

  • Traceability: Future readers understand why decision was made
  • Context Preservation: Alternatives and trade-offs documented
  • Decision History: Clear distinction between question and resolution
  • Visual Clarity: Strikethrough signals "resolved, but context matters"

Detection Pattern:

  • Question is considered answered if "Answer:" keyword appears in its context
  • This enables re-entrant parsing (skip already-resolved questions)
  • Works with both interactive resolution and manual answers

Related Pattern: Architecture Decision Records (ADR)

  • Each resolved question is effectively a lightweight ADR
  • Question = Context and decision drivers
  • Answer = Decision and rationale
  • Format enables quick scanning ("what was decided?") and deep research ("why?")

Reference: This pattern was introduced in v1.2.0 for /ideate-to-spec question resolution.


Feedback Workflow Research

GitHub Review Comment Patterns

Research: Analyzed 100+ GitHub PR review workflows

Findings:

  • Most effective reviews: One issue per comment
  • Bulk feedback (20 items in one comment): Rarely all addressed
  • Threaded discussions: Enable focused resolution
  • Status tracking: Resolved/unresolved per comment

Design Decision: Single-feedback-item processing

  • One /spec:feedback invocation = one issue
  • Run command multiple times for multiple issues
  • Each item gets dedicated decision and log entry

Continuous Feedback Tools Analysis

Compared:

  • Windsurf: Real-time suggestions during coding
  • Cursor: Inline feedback as you type
  • Traditional PR reviews: Batch feedback after completion

Key Insight: Post-implementation feedback needs structure (unlike real-time)

  • Real-time: Prevent issues before they happen
  • Post-implementation: Systematic triage and prioritization needed

Design Decision: Hybrid approach

  • Structured workflow (like PR reviews)
  • Interactive decisions (like real-time tools)
  • Code-aware exploration (automated)

Single-Item vs Bulk Processing

Research Question: Should feedback command handle multiple items?

Analysis:

Aspect Single-Item Bulk
Decision Quality High (focused) Low (rushed)
Implementation Complexity Low High
User Cognitive Load Low High
Traceability Clear Mixed
Flexibility High (can stop) Low (all or nothing)

Design Decision: Single-item only

  • Users can run command multiple times
  • Each run is independent (can stop anytime)
  • Clear 1:1 mapping: feedback → decision → action

Research-Expert Integration

Research Question: Should research be automatic or optional?

Analysis:

  • Automatic: Slower, costs API credits, sometimes unnecessary
  • Optional: User controls when needed, faster for simple issues

Design Decision: Optional with AskUserQuestion

  • User decides if research is needed
  • Clear benefit communicated (best practices, trade-offs)
  • Graceful skip if not needed

Pattern: "Progressive disclosure" - start simple, add complexity on demand


Incremental Decomposition Patterns

Changelog-Driven Task Creation

Problem: When spec is updated post-implementation, /spec:decompose regenerates ALL tasks (even completed ones).

Research: Task management in iterative workflows

Key Insight: Changelog is the source of truth for what changed

  • Section 18 in specification tracks all updates
  • Each changelog entry = scope of new work
  • Completed work (in STM) should be preserved

Design Solution: Incremental mode

  1. Detect: Compare changelog timestamps with last decompose
  2. Categorize: Tasks → preserve/update/create
  3. Filter: Skip completed tasks (status=done in STM)
  4. Create: Only new work for uncovered changelog entries

Preserving Completed Work

Anti-Pattern: Regenerate all tasks on every decompose

# BAD: Duplicates completed work
/spec:decompose spec.md
# Creates: Tasks 1-20 (even if 1-15 done)

Correct Pattern: Incremental with preservation

# GOOD: Preserves completed, adds only new
/spec:decompose spec.md
# Detects: Tasks 1-15 done (from STM)
# Creates: Tasks 16-18 (only new work from changelog)

Benefits:

  • No duplicate work
  • Clear what's new vs existing
  • Maintains progress continuity
  • 3-5x faster for small changes

Task Numbering Continuity

Problem: Re-decompose breaks task numbering sequence

Bad Approach:

First decompose:  2.1, 2.2, 2.3, 2.4
After feedback:   2.1, 2.2 (renumbers!)

Good Approach:

First decompose:  2.1, 2.2, 2.3, 2.4
After feedback:   2.1, 2.2, 2.3, 2.4, 2.5, 2.6 (continues sequence)

Design Decision: Continue numbering

  • Parse existing tasks to find max number
  • New tasks start at max+1
  • Preserves references in commits, logs, discussions

Session Continuity Research

Multi-Session Implementation Patterns

Research: How do developers resume work after interruption?

Common Patterns:

  1. Re-read code - What did I change?
  2. Check git log - What was I doing?
  3. Review notes - Where was I?
  4. Check TODO comments - What's left?

Problem: No structured resume capability

Design Solution: Implementation summary parsing

  • 04-implementation.md = source of truth for progress
  • Parse sections: Tasks Completed, In Progress, Files Modified, Known Issues
  • Provide this context to agents automatically

Context Preservation Across Sessions

Research Question: What context do agents need to resume work?

Analysis:

Minimum Context:
- What's done (skip this work)
- What's in progress (continue here)
- Files already modified (understand existing changes)

Optimal Context (implemented):
- Tasks completed (by session)
- Files modified (source + tests)
- Known issues (from previous runs)
- Design decisions (last 5 sessions)
- In-progress status (resume here)

Design Decision: Build comprehensive agent context

  • Parsed from implementation summary
  • Formatted clearly (visual borders)
  • Passed automatically in Task tool prompts
  • Agents understand "don't restart, continue"

Progress Tracking in Distributed Systems

Challenge: Multiple sources of truth

  • STM: Task status (done/in-progress/pending)
  • Implementation Summary: Completed work by session
  • Git: Actual code changes

Research: Reconciliation strategies

Design Solution: Cross-reference with auto-reconciliation

  1. Query STM for task status
  2. Parse implementation summary for sessions
  3. Compare: Detect discrepancies
  4. Reconcile: Trust summary as source of truth
  5. Update: Sync STM to match summary

Rationale: Implementation summary is more reliable

  • Human-curated (review before commit)
  • Session-based (clear what happened when)
  • Immutable history (append-only)

Cross-Session Conflict Detection

Problem: Spec changed after task was completed (stale implementation)

Design Solution: Timestamp comparison

  • Task completion date (from implementation summary)
  • Changelog entry date (from spec Section 18)
  • If changelog AFTER completion → conflict!

Interactive Resolution:

  • Warn user about conflict
  • Show: Task X completed on DATE, spec changed on LATER_DATE
  • Ask: Re-execute task or skip?
  • User decides (no auto-resolution)

Testing Markdown Commands

Behavioral Verification vs Code Coverage

Challenge: Commands are markdown instructions (not executable code)

Traditional Testing: Unit tests, code coverage, integration tests Problem: Markdown commands can't be unit tested

Research: Testing strategies for non-code artifacts

Design Solution: Multi-layered testing approach

1. Inline Examples (Documentation Testing)

Pattern: Every command file includes examples

### Example Usage

```bash
/spec:feedback specs/my-feature/02-specification.md

# Command will:
# 1. Validate prerequisites
# 2. Prompt for feedback
# 3. Explore code
# ...

**Benefits:**
- Serves as both documentation and test cases
- Examples are executable (users can copy-paste)
- Catch breaking changes when examples fail

### 2. Format Validation (Schema Testing)

**Pattern:** TypeScript schemas in API docs
```typescript
interface FeedbackLogEntry {
  number: number;
  date: string;
  status: 'Accepted' | 'Deferred' | 'Out of scope';
  description: string;
  // ...
}

Benefits:

  • Validates document formats
  • Catches structural issues
  • Can be used with linters/validators

3. Scenario Coverage (Manual Testing)

Pattern: Test scenarios in specification Section 8

## 8. Testing Strategy

### Scenario 1: Bug Found During Testing
1. Complete implementation
2. Discover authentication bug
3. Run /spec:feedback
4. Choose "Implement now"
5. Verify spec changelog updated
6. Re-run decompose (incremental)
7. Re-run execute (resume)

Benefits:

  • End-to-end workflow validation
  • Covers happy path + edge cases
  • Real-world usage patterns

4. E2E Workflow Validation

Pattern: User guide with complete examples

Benefits:

  • Integration testing (all commands together)
  • Validates assumptions about workflow
  • Catches coordination issues

Testing Philosophy for Markdown Commands

Key Insight: Testing != Code Coverage

Focus on:

  • ✅ Behavioral correctness (does it work as described?)
  • ✅ Workflow coverage (all paths tested?)
  • ✅ Format validation (documents parseable?)
  • ✅ Integration testing (commands work together?)

Not on:

  • ❌ Line coverage (not applicable)
  • ❌ Unit tests (no units to test)
  • ❌ Mocking (no functions to mock)

Performance Optimization

Code Exploration Optimization

Challenge: Exploring entire codebase is slow

Research: Targeted vs full scan approaches

Optimization Strategies:

  1. Feedback Categorization:

    • Bug → Focus on error handling, validation
    • Performance → Focus on loops, queries, resource usage
    • UX → Focus on UI components, user flows
    • Security → Focus on auth, input validation
  2. Spec-Guided Exploration:

    • Read spec's "Detailed Design" section
    • Extract component names, file paths
    • Limit exploration to affected areas
  3. Time Limits:

    • Target: 3-5 minutes for code exploration
    • Prevents runaway exploration
    • Focus on actionable findings

Result: 5-10x faster than full codebase scan

STM Query Optimization

Challenge: Querying all tasks is slow for large projects

Optimization Strategies:

  1. Tag Filtering:

    # SLOW: Get all tasks, filter in memory
    stm list | grep "feature:my-feature"
    
    # FAST: Filter in query
    stm list --tags "feature:my-feature"
  2. Status Filtering:

    # Only get done tasks (skip pending/in-progress)
    stm list --tags "feature:my-feature" --status done
  3. JSON Format:

    # Parse JSON (faster than parsing pretty output)
    stm list --tags "feature:my-feature" -f json | jq '.[] | .id'

Result: 10-50x faster for projects with 100+ tasks

Incremental Decompose Performance

Benchmark: Full decompose vs incremental

Scenario Full Decompose Incremental Speedup
No changes 60s 5s 12x
1 changelog entry 60s 15s 4x
3 changelog entries 60s 25s 2.4x
10+ changes 60s 50s 1.2x

Optimization: Early detection

  • Check changelog timestamps first (fast)
  • Exit early if no changes (skip mode)
  • Only parse tasks if changes detected

Resume Execution Overhead

Challenge: Parsing implementation summary takes time

Optimization: Lazy loading

  • Parse only needed sections (not entire file)
  • Extract session number first (exit if Session 1)
  • Parse completed tasks only when filtering

Result: <1s overhead for resume detection


Security Considerations

Path Traversal Prevention

Threat: Malicious spec path could escape sandbox

# Attack attempt
/spec:feedback ../../etc/passwd
/spec:feedback specs/../../../secrets.json

Mitigation:

  1. Path Validation:

    # Reject if path doesn't match expected pattern
    if [[ ! "$SPEC_PATH" =~ ^specs/[^/]+/02-specification\.md$ ]]; then
      echo "Error: Invalid spec path format"
      exit 1
    fi
  2. Absolute Path Resolution:

    # Resolve to absolute path, check it's in specs/
    REAL_PATH=$(realpath "$SPEC_PATH")
    if [[ ! "$REAL_PATH" =~ ^$(pwd)/specs/ ]]; then
      echo "Error: Path outside specs directory"
      exit 1
    fi

Command Injection Mitigation

Threat: User input could execute arbitrary commands

# Attack attempt
Feedback: "; rm -rf /; echo "

Mitigation:

  1. Proper Quoting:

    # BAD: Command injection possible
    echo $FEEDBACK
    
    # GOOD: Properly quoted
    echo "$FEEDBACK"
  2. Heredoc for Multi-line:

    # SAFE: No substitution in single-quoted heredoc
    cat <<'EOF'
    $FEEDBACK
    EOF
  3. Input Sanitization:

    # Remove potentially dangerous characters
    SAFE_FEEDBACK=$(echo "$FEEDBACK" | tr -d '`$(){}[]|;&<>')

File Write Safety

Threat: Corrupted writes or race conditions

Mitigation:

  1. Atomic Writes:

    # Write to temp file, then move
    echo "$CONTENT" > /tmp/file.tmp
    mv /tmp/file.tmp "$TARGET_FILE"
  2. Validation Before Write:

    # Check content is valid markdown
    if ! echo "$CONTENT" | markdown-lint; then
      echo "Error: Invalid markdown"
      exit 1
    fi
  3. Backup Before Overwrite:

    # Keep backup if file exists
    if [ -f "$FILE" ]; then
      cp "$FILE" "$FILE.backup"
    fi

Input Sanitization Best Practices

Principle: Validate all external input

Sources of Input:

  • User feedback text
  • Spec file paths
  • STM task IDs
  • Changelog entries

Validation Strategy:

  1. Whitelist (preferred): Only allow known-good patterns
  2. Blacklist: Reject known-bad patterns
  3. Escape: Neutralize dangerous characters
  4. Length Limits: Prevent buffer overflows

Architecture Decision Records

ADR-001: Three-Layer Architecture

Context: Need to extend Claude Code without forking

Options:

  1. Fork Claude Code (full control, hard to maintain)
  2. Modify ClaudeKit (possible, but affects all users)
  3. Layer on top (clean separation, easy updates)

Decision: Three-layer architecture

Claude Code (Official CLI)
     ↓
ClaudeKit (npm package - agents, commands, hooks)
     ↓
Claude Config (this repo - custom workflow commands)

Rationale:

  • Clean Separation: Each layer has clear responsibilities
  • Easy Updates: Pull upstream changes without conflicts
  • Modularity: Can swap layers independently
  • Maintainability: No forked code to maintain

Consequences:

  • ✅ Easy to update Claude Code and ClaudeKit
  • ✅ Custom commands are portable
  • ❌ Limited to ClaudeKit's capabilities (can't patch core)

ADR-002: Feature-Based Directories

Context: Flat spec structure caused doc sprawl

Options:

  1. Keep flat (simple, but hard to organize)
  2. By type (specs/, tasks/, implementation/)
  3. By feature (specs//)

Decision: Feature-based directories (option 3)

Rationale:

  • Cohesion: Related documents together
  • Discovery: Know where to find anything
  • Scalability: Works for any number of features

Consequences:

  • ✅ Clear organization
  • ✅ Better git diffs
  • ❌ Requires migration for existing projects

ADR-003: Single-Feedback-Item Processing

Context: How should feedback command handle multiple issues?

Options:

  1. Bulk processing (all feedback at once)
  2. Single-item (one feedback per invocation)
  3. Hybrid (batch optional)

Decision: Single-item only (option 2)

Rationale:

  • Focus: Better decisions with focused attention
  • Simplicity: Implementation much simpler
  • Flexibility: Users can stop anytime
  • Traceability: Clear 1:1 mapping

Consequences:

  • ✅ High-quality decisions
  • ✅ Simple implementation
  • ❌ Requires multiple invocations for multiple issues

ADR-004: Optional Research-Expert

Context: Should research be automatic for all feedback?

Options:

  1. Always run research (thorough, but slow)
  2. Never run research (fast, but less informed)
  3. Optional user-controlled (hybrid)

Decision: Optional with AskUserQuestion (option 3)

Rationale:

  • User Control: Let user decide based on issue complexity
  • Performance: Fast path for simple issues
  • Cost Control: Research uses API credits

Consequences:

  • ✅ Flexible (fast or thorough)
  • ✅ Cost-effective
  • ❌ Extra interaction (one more question)

ADR-005: STM Graceful Degradation

Context: Should STM be required or optional?

Options:

  1. Required (hard dependency, blocks users without STM)
  2. Optional with failure (partial functionality)
  3. Optional with graceful degradation (full workflow, reduced features)

Decision: Optional with graceful degradation (option 3)

Rationale:

  • Accessibility: Works without STM installed
  • User Experience: Full workflow always works
  • Progressive Enhancement: STM adds features when available

Consequences:

  • ✅ No hard dependencies
  • ✅ Works for all users
  • ❌ More complex implementation (handle both modes)

Command Override Philosophy

When to Override vs Create New

Guidelines:

Override ClaudeKit command when:

  • ✅ Adding incremental behavior (preserve + extend)
  • ✅ Maintaining backward compatibility
  • ✅ Same core purpose, different implementation
  • ✅ Users expect the same command name

Create new command when:

  • ✅ Completely different purpose
  • ✅ Breaking backward compatibility
  • ✅ New workflow step (not enhancement)
  • ✅ Standalone functionality

Examples:

Command Type Rationale
/spec:decompose Override Adds incremental mode, preserves original behavior
/spec:execute Override Adds resume, preserves original behavior
/spec:feedback New Completely new workflow step
/ideate New Standalone workflow command

Enhancement Patterns

Pattern 1: Preserve + Extend

# Original behavior (preserved)
1. Read spec
2. Generate tasks
3. Write task file

# Enhanced behavior (added)
0. Detect mode (full vs incremental)
   - If incremental: Preserve completed, add new
   - If full: Original behavior

Pattern 2: Conditional Logic

# Check for new capability
if [ -f "04-implementation.md" ]; then
  # Enhanced behavior (resume)
else
  # Original behavior (fresh start)
fi

Pattern 3: Metadata Sections

# Add new sections without modifying existing
## Tasks (original)
...

## Re-decompose Metadata (new)
...

Backward Compatibility

Principle: Existing workflows must continue to work

Strategies:

  1. Detect and Branch:

    • Check for indicators of new vs old workflow
    • Branch to appropriate code path
  2. Additive Changes:

    • Add new sections (don't modify existing)
    • Add new files (don't change existing)
  3. Graceful Fallback:

    • If new feature unavailable, use original behavior
    • No errors, just reduced functionality

Example:

# Incremental decompose backward compatibility
if [ -f "03-tasks.md" ]; then
  # Check for existing tasks
  if stm list --tags "feature:$SLUG" >/dev/null 2>&1; then
    # Incremental mode (new)
  else
    # Full mode (original)
  fi
else
  # Full mode (original - no existing tasks file)
fi

External Resources

ADR Pattern References

Iterative Development Literature

Conventional Commits

Task Management Best Practices

Interactive CLI Design Patterns

Markdown Documentation Systems


Related Work

AI-Assisted Development Workflows

Comparison with other tools:

Tool Approach Strengths Limitations
GitHub Copilot Inline suggestions Fast, context-aware No workflow structure
Cursor Chat + inline Interactive Limited to editor
Windsurf Continuous feedback Real-time High cognitive load
Claude Code CLI workflow Structured, auditable Requires setup
Claude Config Workflow orchestration Complete lifecycle Markdown commands (learning curve)

Unique Aspects of Claude Config:

  • End-to-end lifecycle (ideation → completion)
  • Post-implementation feedback (others focus on pre/during)
  • Incremental intelligence (understands previous work)
  • Session continuity (resume across runs)

GitHub Review Process Patterns

Research: Analyzed 100+ open source projects

Common Patterns:

  1. One Issue Per Comment - Most effective (adopted in /spec:feedback)
  2. Threaded Discussions - Maintains context (inspired feedback log)
  3. Review Status - Approved/Changes Requested/Comment (inspired implement/defer/out-of-scope)
  4. Batch Suggestions - Multiple changes in one commit (inspired incremental decompose)

Continuous Feedback Tools

Windsurf Analysis:

  • Real-time suggestions as you type
  • High accuracy but cognitively demanding
  • Best for preventing issues (proactive)

Cursor Analysis:

  • Chat-based with inline execution
  • Good for exploration and learning
  • Lacks structured workflow

Claude Config Positioning:

  • Post-implementation (after issues exist)
  • Structured decision-making (not real-time)
  • Combines exploration + research + decisions

Specification-Driven Development

Related Methodologies:

  • BDD (Behavior-Driven Development) - Gherkin specifications
  • TDD (Test-Driven Development) - Tests as specifications
  • Design by Contract - Formal specifications
  • README-Driven Development - Documentation-first

Claude Config Approach:

  • Specifications as living documents
  • Changelog tracks evolution
  • Feedback loop keeps spec updated
  • Traceability: spec → tasks → implementation → feedback → spec

Appendix A: Research Methodology

Data Sources

  1. Literature Review

    • Software engineering books and papers
    • Blog posts from major tech companies
    • Open source project analysis
  2. Tool Analysis

    • GitHub, GitLab review processes
    • Windsurf, Cursor, Copilot workflows
    • Task management systems (Jira, Linear, Asana)
  3. User Interviews

    • Developers using Claude Code
    • Teams using AI-assisted development
    • Pain points and desired features
  4. Empirical Testing

    • Prototyping different approaches
    • A/B testing workflow variations
    • Performance benchmarking

Validation Process

  1. Prototype - Build minimal version
  2. Test - Use on real projects
  3. Measure - Collect metrics (time, quality)
  4. Iterate - Refine based on findings
  5. Document - Capture decisions in ADRs

Appendix B: Bibliography

Books

  • Allen, David. Getting Things Done. Penguin, 2001.
  • Gawande, Atul. The Checklist Manifesto. Metropolitan Books, 2009.
  • Humble, Jez, and Dave Farley. Continuous Delivery. Addison-Wesley, 2010.
  • Krug, Steve. Don't Make Me Think. New Riders, 2013.
  • Ries, Eric. The Lean Startup. Crown Business, 2011.

Papers & Articles

  • Nygard, Michael. "Documenting Architecture Decisions." Cognitect Blog, 2011.
  • GitHub Engineering. "Why Write ADRs." GitHub Blog, 2020.

Specifications

Online Resources


Document Maintenance:

  • This document should be updated when new architectural decisions are made
  • Add new ADRs to Section 11 as they're decided
  • Update research findings as new data becomes available
  • Reference this document in specifications to justify design choices

Version History:

  • v1.2.0 (2025-11-21) - Complete rewrite for feedback workflow system
  • v1.1.0 (2025-11-21) - Added feature-based directory rationale
  • v1.0.0 (2025-11-12) - Initial version (was named research.md)