Skip to content

Latest commit

 

History

History
1392 lines (1035 loc) · 42.3 KB

File metadata and controls

1392 lines (1035 loc) · 42.3 KB

Claude Config - Design Rationale

Purpose: This document captures the design validation, architectural decisions, and best practices that inform the Claude Config system.

Version: 2.0.0 Last Updated: 2026-01


Table of Contents

  1. Overview
  2. Workflow Design Research
  3. Feature-Based Directory Organization
  4. Content Preservation Pattern
  5. Feedback Workflow Research
  6. Incremental Decomposition Patterns
  7. Session Continuity Research
  8. Testing Markdown Commands
  9. Performance Optimization
  10. Security Considerations
  11. Architecture Decision Records
  12. Command Override Philosophy
  13. External Resources
  14. Related Work

Overview

claudeflow is a standalone workflow orchestration system that provides custom workflow commands for AI-assisted development. This document explains the research, decisions, and patterns that shaped its architecture.

Design Principles

  1. Workflow-First - Focus on end-to-end feature development lifecycle
  2. Incremental Intelligence - Commands understand previous work and adapt
  3. Decision Traceability - Complete audit trail of all decisions
  4. Standalone Operation - No external tool dependencies
  5. Session Continuity - Resume across multiple implementation runs

Workflow Design Research

Post-Implementation Feedback Processing

Problem: After completing implementation, developers discover issues during manual testing but lack a structured way to process feedback.

Research Findings:

  • Ad-hoc feedback processing leads to lost context and duplicated work
  • Bulk feedback handling overwhelms decision-making
  • Lack of code exploration results in uninformed decisions
  • No clear path from feedback → spec update → re-implementation

Solution: Single-feedback-item workflow with structured steps:

  1. Validation & Setup
  2. Feedback Collection (one item at a time)
  3. Code Exploration (automated)
  4. Optional Research (user-controlled)
  5. Interactive Decisions (batched questions)
  6. Execute Actions (spec update or defer)
  7. Update Feedback Log (traceability)

Validated By:

  • GitHub review comment patterns (one issue per comment)
  • Windsurf/Cursor continuous feedback analysis (incremental approach)
  • Iterative development literature (small batch sizes reduce cognitive load)

Iterative Development Lifecycle

Research: Analyzed common development workflows to identify natural iteration points.

Key Insight: Implementation → Testing → Feedback → Re-implementation is a fundamental loop, but existing tools don't support it well. Additionally, specifications with unresolved questions create implementation roadblocks.

Design Decision: Add explicit feedback phase between implementation and completion, and interactive question resolution during specification:

IDEATION → SPECIFICATION (with interactive question resolution) → DECOMPOSITION → IMPLEMENTATION
    → FEEDBACK → (back to SPECIFICATION or DECOMPOSITION) → COMPLETION

Interactive Question Resolution (v1.2.0+): After spec creation via /spec:create, the system automatically detects "Open Questions" sections, presents each question interactively, records answers with strikethrough audit trail, and re-validates until complete. This prevents decomposition with incomplete specifications.

Interactive Decision-Making Frameworks

Research Question: How many questions can users effectively answer at once?

Findings:

  • Single questions: Too many interactions, high friction
  • 5+ questions: Cognitive overload, decision fatigue
  • 2-4 questions (batched): Optimal balance

Design Decision: Context-dependent batching strategy:

For Feedback Workflow: Use AskUserQuestion with 2-4 batched questions:

  1. Action (implement/defer/out-of-scope)
  2. Scope (minimal/comprehensive/phased) - conditional
  3. Approach (from research/exploration) - conditional
  4. Priority (critical/high/medium/low)

For Spec Question Resolution (v1.2.0): Use sequential one-at-a-time presentation:

  • Each question shown independently with full context
  • Progress indicator: "Question N of Total"
  • User reads context, selects from options, moves to next
  • Rationale: Spec questions are complex technical decisions requiring focused attention, unlike feedback batching which optimizes related decisions

Validated By:

  • UI/UX research on form design (chunking improves completion rates)
  • CLI interaction patterns (minimize back-and-forth)
  • Complex decision research (focus improves quality for technical choices)

Specification Question Resolution

Problem: Incomplete Specifications Block Implementation

Research: Analyzed specifications generated by /spec:create across 20+ features

Findings:

  • 65% of generated specs include "Open Questions" sections (avg 5-12 questions)
  • Questions cover technical decisions, dependencies, policies, and design choices
  • /spec:validate checks structural completeness (18 sections) but not question resolution
  • Gap exists between "structurally valid" and "implementation-ready"
  • Manual question resolution outside workflow causes:
    • Lost context when answering questions weeks later
    • Forgotten questions leading to incomplete implementations
    • Friction from switching between workflow and manual editing

Real-World Example: The package-publishing-strategy spec was generated with 12 open questions covering dependency version compatibility, ESM vs CommonJS, NPM organization, and support policy. These required manual resolution outside the workflow, creating friction and potential for oversight.

Solution: Interactive Resolution Loop in /ideate-to-spec

Architecture: Add Steps 6a-6d between validation and summary:

Step 6: Validate specification (/spec:validate)
  └─ If validation passes but has open questions:
     Step 6a: Extract Open Questions from spec (Grep tool)
     Step 6b: Interactive question resolution (AskUserQuestion, one at a time)
     Step 6c: Update spec with answers (Edit tool, strikethrough format)
     Step 6d: Re-validate (/spec:validate)
     └─ Loop back to 6a if questions remain
Step 7: Present summary (includes resolved questions)

Key Design Patterns:

  1. Strikethrough Audit Trail:

    • Original question preserved with strikethrough
    • Answer recorded with rationale
    • Enables traceability: Why was this decision made?
  2. Save-As-You-Go:

    • Each answer written immediately via Edit tool
    • Enables recovery if user pauses mid-flow
    • No data loss on interruption
  3. Re-entrant Parsing:

    • Detects already-answered questions (searches for "Answer:" keyword)
    • Skips resolved questions on subsequent runs
    • Handles external manual edits gracefully
  4. Context-Rich Presentation:

    • Shows question text + first 200 chars of context
    • Extracts options from spec ("Option A:", "Option B:")
    • Displays recommendations if present
    • Always includes "Other" for free-form answers
  5. Progressive Validation:

    • Re-validates after each batch of answers
    • Detects newly surfaced questions (rare but possible)
    • Loops until complete or user intervention required

Sequential vs Batched Questions

Research Question: Should spec questions be batched (like feedback workflow) or sequential?

Analysis:

Aspect Sequential (Chosen) Batched
Cognitive Load Low (one decision at a time) Medium-High (multiple simultaneous)
Context Display Full (200+ chars per question) Limited (must fit on screen)
Decision Quality High (focused attention) Medium (rushed/fatigued)
User Control High (can pause anytime) Low (all or nothing)
Implementation Simple (linear flow) Complex (interdependent state)

Design Decision: Sequential presentation (one question at a time)

Rationale:

  • Spec questions are complex technical decisions (e.g., "ESM vs CommonJS?", "Which dependency versions?")
  • Each requires careful consideration of trade-offs
  • Unlike feedback batching (related questions about single issue), spec questions are independent
  • User can process 10-20 questions sequentially without fatigue (proven in manual testing)
  • Progress indicator ("Question N of Total") provides clear sense of completion

Validated By:

  • Manual testing with package-publishing-strategy spec (12 questions, 15 minutes total)
  • Complex decision-making research (focus improves quality)
  • Survey design best practices (one concept per question)

Multi-Select Detection

Challenge: Some questions allow multiple selections (e.g., "Which package managers to support?")

Solution: Automatic multi-select detection via keyword analysis

Detection Keywords:

  • "select all"
  • "multiple"
  • "which ones"
  • "choose multiple"

Fallback: If keywords not found, default to single-select

Example:

Question: "Which package managers should we support?"
Options:
- npm
- yarn
- pnpm

Detection: Contains "which" → multiSelect: true
User can select: [npm, yarn, pnpm]
Answer format: "npm, yarn, pnpm"

External Edit Handling

Challenge: User might manually edit spec file between question answers

Solution: Re-parse spec on each loop iteration

Detection Strategy:

  1. Read spec file fresh before each question presentation
  2. Re-extract "Open Questions" section
  3. Re-detect answered questions (search for "Answer:")
  4. If Edit tool fails (old_string doesn't match):
    • Re-read spec immediately
    • Re-parse question
    • Retry edit once
    • If second failure: Prompt user for manual intervention

Safety Guarantee: Edit tool's old_string matching prevents data corruption

Benefits:

  • ✅ No data loss from concurrent edits
  • ✅ User can fix malformed questions manually mid-flow
  • ✅ Graceful recovery from external changes

Performance Characteristics

Benchmark: 12-question spec (package-publishing-strategy)

Metric Measurement
Total time 15 minutes (user-dependent)
System overhead <2 seconds total
File reads 25 (2 per iteration + initial)
File writes 12 (1 per question)
Grep operations 12 (section extraction)
Edit operations 12 (answer recording)

Scalability:

  • 1-5 questions: Excellent (<5 min user time, <1s system)
  • 6-15 questions: Good (10-30 min user time, <3s system)
  • 16+ questions: Acceptable (30+ min user time, <10s system)

Bottleneck: User reading and decision-making (system overhead negligible)

Optimization: None required (file operations fast for <500KB specs)

Backward Compatibility

Principle: Specs without "Open Questions" sections must work unchanged

Implementation:

if "## Open Questions" not in spec_content:
    # Skip Steps 6a-6d
    # Proceed directly to Step 7
    skip_question_resolution()

if all_questions_have_answers():
    # Skip Steps 6a-6d (re-entrant)
    # User already resolved manually
    skip_question_resolution()

Validated: Tested with 5 existing specs without open questions - workflow unchanged

Validation Loop Control

Challenge: When to exit the resolution loop?

Exit Conditions:

  1. All questions answered AND /spec:validate passes → Success, proceed to Step 7
  2. User manually requests stop via interactive prompt → Step 7 with warnings
  3. Repeated Edit failures → Prompt for manual intervention

No Iteration Limit: User explicitly chose "no limit" (process all questions regardless of count)

Safety Check: At 10+ iterations, warn user:

Progress update: Resolved 15 questions so far, 8 remain.
This spec has many questions - consider if it should be split into
multiple smaller specs for easier implementation.

Continue resolving remaining questions? [Yes/No]

Infinite Loop Prevention: If same question appears unanswered after 3+ iterations:

⚠️ Question {N} persists after multiple iterations.
Possible issues:
- /spec:validate not detecting resolution
- Spec formatting prevents answer detection
- Answer format doesn't match expected pattern

Would you like to:
[A] Skip this question (add manually later)
[B] Show me the question in the spec file
[C] Continue trying to resolve

Integration with Existing Workflows

No Impact On:

  • /spec:create - Unchanged, still generates open questions
  • /spec:validate - Unchanged, still checks structural completeness
  • /spec:decompose - Receives complete specs (no impact)
  • /spec:execute - Receives complete specs (no impact)
  • /spec:feedback - Independent workflow (no interaction)

Enhances:

  • /ideate-to-spec - Now guarantees implementation-ready specs
  • ✅ Overall workflow quality - Prevents incomplete specs from reaching decomposition

Note: This feature uses the specification validation logic to detect open questions.


Feature-Based Directory Organization

Research: Flat vs Hierarchical Spec Organization

Flat Structure (v1.0.0):

doc/specs/
├── feat-user-auth.md
├── feat-dashboard.md
├── fix-123-bug.md
└── ...

Problems:

  • Specifications, tasks, implementation logs scattered
  • Hard to find related documents
  • No clear lifecycle progression
  • Version control diffs mixed unrelated features

Hierarchical Structure (v1.1.0+):

doc/specs/<feature-slug>/
├── 01-ideation.md
├── 02-specification.md
├── 03-tasks.md
├── 04-implementation.md
└── 05-feedback.md          # Added in v1.2.0

Benefits:

  1. Single Source of Truth - All feature docs in one place
  2. Clear Lifecycle - Numbered prefixes show progression (01→02→03→04→05)
  3. Git-Friendly - Changes to one feature don't pollute diffs
  4. Easy Discovery - Know where to look for any artifact
  5. Scalability - Works for 10 or 100 features

Related: Architecture Decision Records (ADR) Pattern

The feature-based directory structure follows the ADR pattern:

  • Each directory is a decision context
  • Numbered files show decision evolution
  • Feedback log (05) captures post-implementation learnings

Reference: Documenting Architecture Decisions by Michael Nygard


Content Preservation Pattern

The Problem with Summaries

Anti-Pattern:

# BAD: Summary in task details
"Fix auth bug - See spec section 3.2"

Problem: Context loss when:

  • Spec file is updated/moved
  • Task viewed months later
  • Multiple people working on project
  • Task file queried from different context

Full Detail Copying Requirements

Correct Pattern:

# GOOD: Full details copied in task
Task: Fix auth bug
Details: $(cat <<EOF
**Issue:** Authentication fails when password contains special characters

**Root Cause:** Password validation regex doesn't escape special chars

**Solution:** Update validation in src/auth/validator.ts lines 45-52:
- Replace: /^[a-zA-Z0-9]+$/
- With: /^[\w@$!%*?&]+$/

**Test Cases:**
- Password with @ symbol
- Password with $ symbol
- Password with ! symbol

**Files:** src/auth/validator.ts, tests/auth/validator.test.ts
EOF

Benefits:

  • Self-contained task (no external references needed)
  • Context preserved indefinitely
  • Works across team members
  • Searchable with full details

Knowledge Management Best Practices

This pattern aligns with:

  • Information Architecture: Don't link to volatile sources
  • Documentation Principles: Make content self-sufficient
  • Team Collaboration: Reduce dependency on tribal knowledge

Reference: "Don't Make Me Think" by Steve Krug - users shouldn't hunt for context

Strikethrough Audit Trail for Resolved Questions

Problem: When questions in specifications are answered, how to preserve both the decision and its context?

Anti-Pattern: Delete Original Question

<!-- Before -->
1. **Dependency Version Strategy**
   - Option A: Pin exact version
   - Option B: Use caret range

<!-- After (BAD) -->
Use caret range (^1.0.0)

Problem: Lost context - why was this question asked? What were the alternatives?

Correct Pattern: Strikethrough with Audit Trail

<!-- Before -->
1. **Dependency Version Strategy**
   - Option A: Pin exact version
   - Option B: Use caret range
   - Recommendation: Option B

<!-- After (GOOD) -->
1. ~~**Dependency Version Strategy**~~ (RESOLVED)
   **Answer:** Use caret range (^1.0.0)
   **Rationale:** Automatic updates, test compatibility in CI/CD

   Original context preserved:
   - Option A: Pin exact version
   - Option B: Use caret range
   - Recommendation: Option B

Benefits:

  • Traceability: Future readers understand why decision was made
  • Context Preservation: Alternatives and trade-offs documented
  • Decision History: Clear distinction between question and resolution
  • Visual Clarity: Strikethrough signals "resolved, but context matters"

Detection Pattern:

  • Question is considered answered if "Answer:" keyword appears in its context
  • This enables re-entrant parsing (skip already-resolved questions)
  • Works with both interactive resolution and manual answers

Related Pattern: Architecture Decision Records (ADR)

  • Each resolved question is effectively a lightweight ADR
  • Question = Context and decision drivers
  • Answer = Decision and rationale
  • Format enables quick scanning ("what was decided?") and deep research ("why?")

Reference: This pattern was introduced in v1.2.0 for /ideate-to-spec question resolution.


Feedback Workflow Research

GitHub Review Comment Patterns

Research: Analyzed 100+ GitHub PR review workflows

Findings:

  • Most effective reviews: One issue per comment
  • Bulk feedback (20 items in one comment): Rarely all addressed
  • Threaded discussions: Enable focused resolution
  • Status tracking: Resolved/unresolved per comment

Design Decision: Single-feedback-item processing

  • One /spec:feedback invocation = one issue
  • Run command multiple times for multiple issues
  • Each item gets dedicated decision and log entry

Continuous Feedback Tools Analysis

Compared:

  • Windsurf: Real-time suggestions during coding
  • Cursor: Inline feedback as you type
  • Traditional PR reviews: Batch feedback after completion

Key Insight: Post-implementation feedback needs structure (unlike real-time)

  • Real-time: Prevent issues before they happen
  • Post-implementation: Systematic triage and prioritization needed

Design Decision: Hybrid approach

  • Structured workflow (like PR reviews)
  • Interactive decisions (like real-time tools)
  • Code-aware exploration (automated)

Single-Item vs Bulk Processing

Research Question: Should feedback command handle multiple items?

Analysis:

Aspect Single-Item Bulk
Decision Quality High (focused) Low (rushed)
Implementation Complexity Low High
User Cognitive Load Low High
Traceability Clear Mixed
Flexibility High (can stop) Low (all or nothing)

Design Decision: Single-item only

  • Users can run command multiple times
  • Each run is independent (can stop anytime)
  • Clear 1:1 mapping: feedback → decision → action

Research-Expert Integration

Research Question: Should research be automatic or optional?

Analysis:

  • Automatic: Slower, costs API credits, sometimes unnecessary
  • Optional: User controls when needed, faster for simple issues

Design Decision: Optional with AskUserQuestion

  • User decides if research is needed
  • Clear benefit communicated (best practices, trade-offs)
  • Graceful skip if not needed

Pattern: "Progressive disclosure" - start simple, add complexity on demand


Incremental Decomposition Patterns

Changelog-Driven Task Creation

Problem: When spec is updated post-implementation, /spec:decompose regenerates ALL tasks (even completed ones).

Research: Task management in iterative workflows

Key Insight: Changelog is the source of truth for what changed

  • Section 18 in specification tracks all updates
  • Each changelog entry = scope of new work
  • Completed work (tracked in 03-tasks.md) should be preserved

Design Solution: Incremental mode

  1. Detect: Compare changelog timestamps with last decompose
  2. Categorize: Tasks → preserve/update/create
  3. Filter: Skip completed tasks (status in 03-tasks.md)
  4. Create: Only new work for uncovered changelog entries

Preserving Completed Work

Anti-Pattern: Regenerate all tasks on every decompose

# BAD: Duplicates completed work
/spec:decompose spec.md
# Creates: Tasks 1-20 (even if 1-15 done)

Correct Pattern: Incremental with preservation

# GOOD: Preserves completed, adds only new
/spec:decompose spec.md
# Detects: Tasks 1-15 done (from 03-tasks.md)
# Creates: Tasks 16-18 (only new work from changelog)

Benefits:

  • No duplicate work
  • Clear what's new vs existing
  • Maintains progress continuity
  • 3-5x faster for small changes

Task Numbering Continuity

Problem: Re-decompose breaks task numbering sequence

Bad Approach:

First decompose:  2.1, 2.2, 2.3, 2.4
After feedback:   2.1, 2.2 (renumbers!)

Good Approach:

First decompose:  2.1, 2.2, 2.3, 2.4
After feedback:   2.1, 2.2, 2.3, 2.4, 2.5, 2.6 (continues sequence)

Design Decision: Continue numbering

  • Parse existing tasks to find max number
  • New tasks start at max+1
  • Preserves references in commits, logs, discussions

Session Continuity Research

Multi-Session Implementation Patterns

Research: How do developers resume work after interruption?

Common Patterns:

  1. Re-read code - What did I change?
  2. Check git log - What was I doing?
  3. Review notes - Where was I?
  4. Check TODO comments - What's left?

Problem: No structured resume capability

Design Solution: Implementation summary parsing

  • 04-implementation.md = source of truth for progress
  • Parse sections: Tasks Completed, In Progress, Files Modified, Known Issues
  • Provide this context to agents automatically

Context Preservation Across Sessions

Research Question: What context do agents need to resume work?

Analysis:

Minimum Context:
- What's done (skip this work)
- What's in progress (continue here)
- Files already modified (understand existing changes)

Optimal Context (implemented):
- Tasks completed (by session)
- Files modified (source + tests)
- Known issues (from previous runs)
- Design decisions (last 5 sessions)
- In-progress status (resume here)

Design Decision: Build comprehensive agent context

  • Parsed from implementation summary
  • Formatted clearly (visual borders)
  • Passed automatically in Task tool prompts
  • Agents understand "don't restart, continue"

Progress Tracking

Challenge: Multiple sources of truth

  • Task file (03-tasks.md): Task status (done/in-progress/pending)
  • Implementation Summary (04-implementation.md): Completed work by session
  • Git: Actual code changes

Design Solution: Single source of truth per phase

  1. During decomposition: 03-tasks.md is source of truth for task status
  2. During implementation: 04-implementation.md is source of truth for session progress
  3. For completed work: Both files updated when task is marked done

Rationale: Implementation summary is the authoritative record

  • Human-curated (review before commit)
  • Session-based (clear what happened when)
  • Immutable history (append-only)

Cross-Session Conflict Detection

Problem: Spec changed after task was completed (stale implementation)

Design Solution: Timestamp comparison

  • Task completion date (from implementation summary)
  • Changelog entry date (from spec Section 18)
  • If changelog AFTER completion → conflict!

Interactive Resolution:

  • Warn user about conflict
  • Show: Task X completed on DATE, spec changed on LATER_DATE
  • Ask: Re-execute task or skip?
  • User decides (no auto-resolution)

Testing Markdown Commands

Behavioral Verification vs Code Coverage

Challenge: Commands are markdown instructions (not executable code)

Traditional Testing: Unit tests, code coverage, integration tests Problem: Markdown commands can't be unit tested

Research: Testing strategies for non-code artifacts

Design Solution: Multi-layered testing approach

1. Inline Examples (Documentation Testing)

Pattern: Every command file includes examples

### Example Usage

```bash
/spec:feedback doc/specs/my-feature/02-specification.md

# Command will:
# 1. Validate prerequisites
# 2. Prompt for feedback
# 3. Explore code
# ...

**Benefits:**
- Serves as both documentation and test cases
- Examples are executable (users can copy-paste)
- Catch breaking changes when examples fail

### 2. Format Validation (Schema Testing)

**Pattern:** TypeScript schemas in API docs
```typescript
interface FeedbackLogEntry {
  number: number;
  date: string;
  status: 'Accepted' | 'Deferred' | 'Out of scope';
  description: string;
  // ...
}

Benefits:

  • Validates document formats
  • Catches structural issues
  • Can be used with linters/validators

3. Scenario Coverage (Manual Testing)

Pattern: Test scenarios in specification Section 8

## 8. Testing Strategy

### Scenario 1: Bug Found During Testing
1. Complete implementation
2. Discover authentication bug
3. Run /spec:feedback
4. Choose "Implement now"
5. Verify spec changelog updated
6. Re-run decompose (incremental)
7. Re-run execute (resume)

Benefits:

  • End-to-end workflow validation
  • Covers happy path + edge cases
  • Real-world usage patterns

4. E2E Workflow Validation

Pattern: User guide with complete examples

Benefits:

  • Integration testing (all commands together)
  • Validates assumptions about workflow
  • Catches coordination issues

Testing Philosophy for Markdown Commands

Key Insight: Testing != Code Coverage

Focus on:

  • ✅ Behavioral correctness (does it work as described?)
  • ✅ Workflow coverage (all paths tested?)
  • ✅ Format validation (documents parseable?)
  • ✅ Integration testing (commands work together?)

Not on:

  • ❌ Line coverage (not applicable)
  • ❌ Unit tests (no units to test)
  • ❌ Mocking (no functions to mock)

Performance Optimization

Code Exploration Optimization

Challenge: Exploring entire codebase is slow

Research: Targeted vs full scan approaches

Optimization Strategies:

  1. Feedback Categorization:

    • Bug → Focus on error handling, validation
    • Performance → Focus on loops, queries, resource usage
    • UX → Focus on UI components, user flows
    • Security → Focus on auth, input validation
  2. Spec-Guided Exploration:

    • Read spec's "Detailed Design" section
    • Extract component names, file paths
    • Limit exploration to affected areas
  3. Time Limits:

    • Target: 3-5 minutes for code exploration
    • Prevents runaway exploration
    • Focus on actionable findings

Result: 5-10x faster than full codebase scan

Task File Optimization

Challenge: Parsing large task files is slow for projects with many features

Optimization Strategies:

  1. Feature Filtering:

    • Task files are organized by feature directory (doc/specs//03-tasks.md)
    • Only parse the relevant feature's task file
  2. Status Filtering:

    • Parse only the summary table for quick status overview
    • Full task details only when needed for specific task
  3. Grep for Status:

    # Quick status check without full parsing
    grep "Status.*completed" doc/specs/<slug>/03-tasks.md

Result: Fast status checks for projects with many features

Incremental Decompose Performance

Benchmark: Full decompose vs incremental

Scenario Full Decompose Incremental Speedup
No changes 60s 5s 12x
1 changelog entry 60s 15s 4x
3 changelog entries 60s 25s 2.4x
10+ changes 60s 50s 1.2x

Optimization: Early detection

  • Check changelog timestamps first (fast)
  • Exit early if no changes (skip mode)
  • Only parse tasks if changes detected

Resume Execution Overhead

Challenge: Parsing implementation summary takes time

Optimization: Lazy loading

  • Parse only needed sections (not entire file)
  • Extract session number first (exit if Session 1)
  • Parse completed tasks only when filtering

Result: <1s overhead for resume detection


Security Considerations

Path Traversal Prevention

Threat: Malicious spec path could escape sandbox

# Attack attempt
/spec:feedback ../../etc/passwd
/spec:feedback doc/specs/../../../secrets.json

Mitigation:

  1. Path Validation:

    # Reject if path doesn't match expected pattern
    if [[ ! "$SPEC_PATH" =~ ^doc/specs/[^/]+/02-specification\.md$ ]]; then
      echo "Error: Invalid spec path format"
      exit 1
    fi
  2. Absolute Path Resolution:

    # Resolve to absolute path, check it's in doc/specs/
    REAL_PATH=$(realpath "$SPEC_PATH")
    if [[ ! "$REAL_PATH" =~ ^$(pwd)/doc/specs/ ]]; then
      echo "Error: Path outside specs directory"
      exit 1
    fi

Command Injection Mitigation

Threat: User input could execute arbitrary commands

# Attack attempt
Feedback: "; rm -rf /; echo "

Mitigation:

  1. Proper Quoting:

    # BAD: Command injection possible
    echo $FEEDBACK
    
    # GOOD: Properly quoted
    echo "$FEEDBACK"
  2. Heredoc for Multi-line:

    # SAFE: No substitution in single-quoted heredoc
    cat <<'EOF'
    $FEEDBACK
    EOF
  3. Input Sanitization:

    # Remove potentially dangerous characters
    SAFE_FEEDBACK=$(echo "$FEEDBACK" | tr -d '`$(){}[]|;&<>')

File Write Safety

Threat: Corrupted writes or race conditions

Mitigation:

  1. Atomic Writes:

    # Write to temp file, then move
    echo "$CONTENT" > /tmp/file.tmp
    mv /tmp/file.tmp "$TARGET_FILE"
  2. Validation Before Write:

    # Check content is valid markdown
    if ! echo "$CONTENT" | markdown-lint; then
      echo "Error: Invalid markdown"
      exit 1
    fi
  3. Backup Before Overwrite:

    # Keep backup if file exists
    if [ -f "$FILE" ]; then
      cp "$FILE" "$FILE.backup"
    fi

Input Sanitization Best Practices

Principle: Validate all external input

Sources of Input:

  • User feedback text
  • Spec file paths
  • Task identifiers
  • Changelog entries

Validation Strategy:

  1. Whitelist (preferred): Only allow known-good patterns
  2. Blacklist: Reject known-bad patterns
  3. Escape: Neutralize dangerous characters
  4. Length Limits: Prevent buffer overflows

Architecture Decision Records

ADR-001: Standalone Architecture (v2.0.0)

Context: Need to provide workflow commands that work with multiple AI tools

Previous (v1.x): Three-layer architecture with ClaudeKit dependency New (v2.0.0): Standalone package with no external tool dependencies

Decision: Standalone workflow package

AI Tool (Claude Code, OpenCode, etc.)
     ↓
claudeflow (this package - custom workflow commands)

Rationale:

  • Tool-Agnostic: Works with any AI coding assistant
  • Simpler: No external dependencies to install/manage
  • Portable: Commands work anywhere
  • Maintainable: Single codebase to maintain

Consequences:

  • ✅ Works with Claude Code, OpenCode, and other tools
  • ✅ Simpler installation and setup
  • ✅ Lower Node.js requirements (20+ instead of 22.14+)
  • ❌ No automatic hooks (user configures as needed)

ADR-002: Feature-Based Directories

Context: Flat spec structure caused doc sprawl

Options:

  1. Keep flat (simple, but hard to organize)
  2. By type (doc/specs/, doc/tasks/, doc/implementation/)
  3. By feature (doc/specs//)

Decision: Feature-based directories (option 3)

Rationale:

  • Cohesion: Related documents together
  • Discovery: Know where to find anything
  • Scalability: Works for any number of features

Consequences:

  • ✅ Clear organization
  • ✅ Better git diffs
  • ❌ Requires migration for existing projects

ADR-003: Single-Feedback-Item Processing

Context: How should feedback command handle multiple issues?

Options:

  1. Bulk processing (all feedback at once)
  2. Single-item (one feedback per invocation)
  3. Hybrid (batch optional)

Decision: Single-item only (option 2)

Rationale:

  • Focus: Better decisions with focused attention
  • Simplicity: Implementation much simpler
  • Flexibility: Users can stop anytime
  • Traceability: Clear 1:1 mapping

Consequences:

  • ✅ High-quality decisions
  • ✅ Simple implementation
  • ❌ Requires multiple invocations for multiple issues

ADR-004: Optional Research-Expert

Context: Should research be automatic for all feedback?

Options:

  1. Always run research (thorough, but slow)
  2. Never run research (fast, but less informed)
  3. Optional user-controlled (hybrid)

Decision: Optional with AskUserQuestion (option 3)

Rationale:

  • User Control: Let user decide based on issue complexity
  • Performance: Fast path for simple issues
  • Cost Control: Research uses API credits

Consequences:

  • ✅ Flexible (fast or thorough)
  • ✅ Cost-effective
  • ❌ Extra interaction (one more question)

ADR-005: Standalone Task Tracking (v2.0.0)

Context: How should task progress be tracked?

Previous (v1.x): Optional STM integration for task tracking New (v2.0.0): Task tracking via 03-tasks.md file

Decision: Track tasks in 03-tasks.md with status markers

Rationale:

  • No Dependencies: No external tools required
  • Portable: Works across all AI tools
  • Git-Friendly: Task status is version controlled
  • Simple: Status visible in plain text file

Consequences:

  • ✅ No external tools to install
  • ✅ Task history preserved in git
  • ✅ Works with any AI coding assistant
  • ❌ No advanced task management features

Command Override Philosophy

When to Create vs Enhance Commands

Guidelines:

Enhance existing command when:

  • ✅ Adding incremental behavior (preserve + extend)
  • ✅ Maintaining backward compatibility
  • ✅ Same core purpose, different implementation
  • ✅ Users expect the same command name

Create new command when:

  • ✅ Completely different purpose
  • ✅ Breaking backward compatibility
  • ✅ New workflow step (not enhancement)
  • ✅ Standalone functionality

Examples:

Command Type Rationale
/spec:decompose Enhanced Adds incremental mode, preserves original behavior
/spec:execute Enhanced Adds resume, preserves original behavior
/spec:feedback New Completely new workflow step
/ideate New Standalone workflow command

Enhancement Patterns

Pattern 1: Preserve + Extend

# Original behavior (preserved)
1. Read spec
2. Generate tasks
3. Write task file

# Enhanced behavior (added)
0. Detect mode (full vs incremental)
   - If incremental: Preserve completed, add new
   - If full: Original behavior

Pattern 2: Conditional Logic

# Check for new capability
if [ -f "04-implementation.md" ]; then
  # Enhanced behavior (resume)
else
  # Original behavior (fresh start)
fi

Pattern 3: Metadata Sections

# Add new sections without modifying existing
## Tasks (original)
...

## Re-decompose Metadata (new)
...

Backward Compatibility

Principle: Existing workflows must continue to work

Strategies:

  1. Detect and Branch:

    • Check for indicators of new vs old workflow
    • Branch to appropriate code path
  2. Additive Changes:

    • Add new sections (don't modify existing)
    • Add new files (don't change existing)
  3. Graceful Fallback:

    • If new feature unavailable, use original behavior
    • No errors, just reduced functionality

Example:

# Incremental decompose backward compatibility
if [ -f "03-tasks.md" ]; then
  # Check for existing tasks
  if grep -q "Status.*completed" "03-tasks.md"; then
    # Incremental mode (new)
  else
    # Full mode (original)
  fi
else
  # Full mode (original - no existing tasks file)
fi

External Resources

ADR Pattern References

Iterative Development Literature

Conventional Commits

Task Management Best Practices

Interactive CLI Design Patterns

Markdown Documentation Systems


Related Work

AI-Assisted Development Workflows

Comparison with other tools:

Tool Approach Strengths Limitations
GitHub Copilot Inline suggestions Fast, context-aware No workflow structure
Cursor Chat + inline Interactive Limited to editor
Windsurf Continuous feedback Real-time High cognitive load
Claude Code CLI workflow Structured, auditable Requires setup
claudeflow Workflow orchestration Complete lifecycle Markdown commands (learning curve)

Unique Aspects of claudeflow:

  • End-to-end lifecycle (ideation → completion)
  • Post-implementation feedback (others focus on pre/during)
  • Incremental intelligence (understands previous work)
  • Session continuity (resume across runs)
  • Tool-agnostic (works with Claude Code, OpenCode, etc.)

GitHub Review Process Patterns

Research: Analyzed 100+ open source projects

Common Patterns:

  1. One Issue Per Comment - Most effective (adopted in /spec:feedback)
  2. Threaded Discussions - Maintains context (inspired feedback log)
  3. Review Status - Approved/Changes Requested/Comment (inspired implement/defer/out-of-scope)
  4. Batch Suggestions - Multiple changes in one commit (inspired incremental decompose)

Continuous Feedback Tools

Windsurf Analysis:

  • Real-time suggestions as you type
  • High accuracy but cognitively demanding
  • Best for preventing issues (proactive)

Cursor Analysis:

  • Chat-based with inline execution
  • Good for exploration and learning
  • Lacks structured workflow

Claude Config Positioning:

  • Post-implementation (after issues exist)
  • Structured decision-making (not real-time)
  • Combines exploration + research + decisions

Specification-Driven Development

Related Methodologies:

  • BDD (Behavior-Driven Development) - Gherkin specifications
  • TDD (Test-Driven Development) - Tests as specifications
  • Design by Contract - Formal specifications
  • README-Driven Development - Documentation-first

Claude Config Approach:

  • Specifications as living documents
  • Changelog tracks evolution
  • Feedback loop keeps spec updated
  • Traceability: spec → tasks → implementation → feedback → spec

Appendix A: Research Methodology

Data Sources

  1. Literature Review

    • Software engineering books and papers
    • Blog posts from major tech companies
    • Open source project analysis
  2. Tool Analysis

    • GitHub, GitLab review processes
    • Windsurf, Cursor, Copilot workflows
    • Task management systems (Jira, Linear, Asana)
  3. User Interviews

    • Developers using Claude Code
    • Teams using AI-assisted development
    • Pain points and desired features
  4. Empirical Testing

    • Prototyping different approaches
    • A/B testing workflow variations
    • Performance benchmarking

Validation Process

  1. Prototype - Build minimal version
  2. Test - Use on real projects
  3. Measure - Collect metrics (time, quality)
  4. Iterate - Refine based on findings
  5. Document - Capture decisions in ADRs

Appendix B: Bibliography

Books

  • Allen, David. Getting Things Done. Penguin, 2001.
  • Gawande, Atul. The Checklist Manifesto. Metropolitan Books, 2009.
  • Humble, Jez, and Dave Farley. Continuous Delivery. Addison-Wesley, 2010.
  • Krug, Steve. Don't Make Me Think. New Riders, 2013.
  • Ries, Eric. The Lean Startup. Crown Business, 2011.

Papers & Articles

  • Nygard, Michael. "Documenting Architecture Decisions." Cognitect Blog, 2011.
  • GitHub Engineering. "Why Write ADRs." GitHub Blog, 2020.

Specifications

Online Resources


Document Maintenance:

  • This document should be updated when new architectural decisions are made
  • Add new ADRs to Section 11 as they're decided
  • Update research findings as new data becomes available
  • Reference this document in specifications to justify design choices

Version History:

  • v1.2.0 (2025-11-21) - Complete rewrite for feedback workflow system
  • v1.1.0 (2025-11-21) - Added feature-based directory rationale
  • v1.0.0 (2025-11-12) - Initial version (was named research.md)