diff --git a/.claude/sessions/exec-session-20260222-180300/.lock b/.claude/sessions/exec-session-20260222-180300/.lock new file mode 100644 index 0000000..896b481 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/.lock @@ -0,0 +1,3 @@ +task_execution_id: exec-session-20260222-180300 +timestamp: 2026-02-22T18:03:00Z +pid: orchestrator diff --git a/.claude/sessions/exec-session-20260222-180300/execution_context.md b/.claude/sessions/exec-session-20260222-180300/execution_context.md new file mode 100644 index 0000000..4851ff7 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/execution_context.md @@ -0,0 +1,52 @@ +# Execution Context + +## Project Patterns +- Plugin naming: `agent-alchemy-{group-name}` for marketplace, `{group-name}` for directory +- Reference files: H1 title, intro paragraph, structured sections with tables/code/bullets +- Agent frontmatter: `name`, `description`, `model`, `tools`, `skills` in YAML + markdown system prompt +- Phase-based workflows with "CRITICAL: Complete ALL N phases" directive +- Hook scripts: trap ERR, debug function, jq parsing. All hooks must NEVER exit non-zero +- Shared test fixtures: `claude/sdd-tools/tests/fixtures/` + +## Key Decisions +- [Task #155] Structured context schema: 6 sections; compaction at 10+ entries +- [Task #156] task-executor.md has embedded rules (414 lines); reference files documentation-only +- [Task #161] Watch-first, poll-fallback completion detection +- [Task #163] File conflict detection at Step 7a.5 +- [Task #164] produces_for injection uses `CONTEXT FROM COMPLETED DEPENDENCIES` header +- [Task #166] 3-tier retry escalation: Standard → Context Enrichment → User Escalation +- [Task #167] Progress streaming: session start, wave start, wave completion summaries +- [Task #168] Post-merge validation: OK/WARN/ERROR; force compaction at >1000 lines +- [Task #165] create-tasks now 9 phases; Phase 6 = producer-consumer detection +- [Task #169] task-executor.md result file format is authoritative; orchestration.md needs updating + +## Known Issues +- Result file format in orchestration.md (Result File Protocol + 7c prompt template) doesn't match task-executor.md. Non-blocking: validate-result.sh enforces correct format. +- SKILL.md and orchestration.md step numbering diverge at Step 5/5.5 +- Concurrent edits to orchestration.md caused Edit conflicts in Wave 3a +- hooks.json timeout field is in seconds, not milliseconds + +## File Map +- `claude/sdd-tools/skills/execute-tasks/references/orchestration.md` — ~1223 lines +- `claude/sdd-tools/agents/task-executor.md` — 414 lines with embedded rules +- `claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md` — 380 lines, documentation-only +- `claude/sdd-tools/skills/execute-tasks/scripts/watch-for-results.sh` — Event-driven watcher +- `claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh` — Adaptive polling (133 lines) +- `claude/sdd-tools/hooks/validate-result.sh` — Result validation (~100 lines) +- `claude/sdd-tools/skills/create-tasks/SKILL.md` — 9-phase workflow (~738 lines) + +## Task History +### Prior Sessions Summary +Previous session implemented 14 TDD tools plugin tasks. All passed. + +### Tasks [155-161]: Foundation — ALL PASS +Structured context schema, embedded rules, watch/poll scripts, validation hook, event-driven detection. + +### Tasks [163-168]: Orchestration hardening — ALL PASS +Conflict detection, produces_for, retry escalation, progress streaming, merge validation. + +### Tasks [162, 165]: Tests and create-tasks — ALL PASS +44 bats tests passing. create-tasks Phase 6 for produces_for detection. + +### Task [169]: E2E validation — PASS +10/10 features validated, 44/44 tests pass, 1 non-blocking format inconsistency noted. diff --git a/.claude/sessions/exec-session-20260222-180300/execution_plan.md b/.claude/sessions/exec-session-20260222-180300/execution_plan.md new file mode 100644 index 0000000..9b58680 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/execution_plan.md @@ -0,0 +1,37 @@ +# Execution Plan + +task_execution_id: exec-session-20260222-180300 +max_parallel: 5 +retries: 3 +total_tasks: 16 +total_waves: 7 + +## Wave 1 (4 tasks) +1. [#155] Define structured context schema and update orchestration.md merge procedures +2. [#156] Embed verification and execution rules in task-executor.md +3. [#159] Create filesystem watch script (watch-for-results.sh) +4. [#160] Implement adaptive polling in poll-for-results.sh + +## Wave 2 (3 tasks) +5. [#157] Update execution-workflow.md for structured context and embedded rules — after [#155, #156] +6. [#158] Create result validation hook (validate-result.sh + hooks.json) — after [#155] +7. [#161] Update orchestration.md for event-driven completion detection — after [#159, #160] + +## Wave 3a (5 tasks) +8. [#164] Add produces_for prompt injection logic to orchestration and SKILL.md — after [#158] +9. [#163] Add file conflict detection to orchestration and SKILL.md — after [#157] +10. [#166] Add retry escalation logic to orchestration and SKILL.md — after [#157] +11. [#167] Add progress streaming to orchestration and SKILL.md — after [#161] +12. [#168] Add post-wave merge validation to orchestration.md — after [#157] + +## Wave 3b (1 task) +13. [#162] Write bats tests for shell scripts — after [#158, #159, #160] + +## Wave 4 (1 task) +14. [#165] Update create-tasks skill for produces_for field emission — after [#164] + +## Wave 5 (1 task) +15. [#169] Run end-to-end validation session — after [#163, #164, #165, #166, #167, #168] + +## Wave 6 (1 task) +16. [#170] Update documentation for hardening changes — after [#169] diff --git a/.claude/sessions/exec-session-20260222-180300/progress.md b/.claude/sessions/exec-session-20260222-180300/progress.md new file mode 100644 index 0000000..7e574c1 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/progress.md @@ -0,0 +1,25 @@ +# Execution Progress +Status: Complete +Wave: 6 of 6 +Max Parallel: 5 +Updated: 2026-02-22T18:46:00Z + +## Active Tasks + +## Completed This Session +- [#155] Define structured context schema — PASS (2m 16s) +- [#156] Embed verification and execution rules — PASS (3m 53s) +- [#159] Create filesystem watch script — PASS (7m 25s) +- [#160] Implement adaptive polling — PASS (10m 31s) +- [#157] Update execution-workflow.md — PASS (2m 14s) +- [#158] Create result validation hook — PASS (4m 47s) +- [#161] Update orchestration.md event-driven detection — PASS (2m 42s) +- [#163] Add file conflict detection — PASS (3m 13s) +- [#164] Add produces_for prompt injection — PASS (5m 20s) +- [#166] Add retry escalation logic — PASS (6m 42s) +- [#167] Add progress streaming — PASS (2m 28s) +- [#168] Add post-wave merge validation — PASS (3m 49s) +- [#162] Write bats tests for shell scripts — PASS (9m 50s) +- [#165] Update create-tasks for produces_for — PASS (2m 29s) +- [#169] Run end-to-end validation session — PASS (5m 29s) +- [#170] Update documentation for hardening changes — PASS (1m 51s) diff --git a/.claude/sessions/exec-session-20260222-180300/session_summary.md b/.claude/sessions/exec-session-20260222-180300/session_summary.md new file mode 100644 index 0000000..75a23af --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/session_summary.md @@ -0,0 +1,91 @@ +# Execution Summary + +## Overview +- **Session ID**: exec-session-20260222-180300 +- **Spec**: internal/specs/execute-tasks-hardening-SPEC.md +- **Tasks executed**: 16 +- **Passed**: 16 +- **Failed**: 0 +- **Retries**: 0 +- **Waves completed**: 6 +- **Max parallel**: 5 +- **Total execution time**: 75m 50s (sum of agent durations) +- **Total token usage**: 1,276,713 + +## Wave Breakdown + +### Wave 1 (4 tasks) — ALL PASS +| Task | Duration | Tokens | +|------|----------|--------| +| [#155] Define structured context schema | 2m 16s | 83,405 | +| [#156] Embed verification and execution rules | 3m 53s | 66,262 | +| [#159] Create filesystem watch script | 7m 25s | 59,708 | +| [#160] Implement adaptive polling | 10m 31s | 56,448 | + +### Wave 2 (3 tasks) — ALL PASS +| Task | Duration | Tokens | +|------|----------|--------| +| [#157] Update execution-workflow.md | 2m 14s | 68,149 | +| [#158] Create result validation hook | 4m 47s | 52,753 | +| [#161] Update orchestration.md event-driven detection | 2m 42s | 74,725 | + +### Wave 3a (5 tasks) — ALL PASS +| Task | Duration | Tokens | +|------|----------|--------| +| [#163] Add file conflict detection | 3m 13s | 100,566 | +| [#164] Add produces_for prompt injection | 5m 20s | 108,346 | +| [#166] Add retry escalation logic | 6m 42s | 147,196 | +| [#167] Add progress streaming | 2m 28s | 75,143 | +| [#168] Add post-wave merge validation | 3m 49s | 101,300 | + +### Wave 3b+4 (2 tasks) — ALL PASS +| Task | Duration | Tokens | +|------|----------|--------| +| [#162] Write bats tests for shell scripts | 9m 50s | 61,631 | +| [#165] Update create-tasks for produces_for | 2m 29s | 77,656 | + +### Wave 5 (1 task) — PASS +| Task | Duration | Tokens | +|------|----------|--------| +| [#169] Run end-to-end validation session | 5m 29s | 89,688 | + +### Wave 6 (1 task) — PASS +| Task | Duration | Tokens | +|------|----------|--------| +| [#170] Update documentation for hardening changes | 1m 51s | 49,137 | + +## Features Implemented +1. **Structured context schema** — 6-section schema for execution_context.md and per-task context files +2. **Embedded agent rules** — task-executor.md has full execution workflow embedded (414 lines) +3. **Event-driven completion** — watch-for-results.sh (fswatch) with poll-for-results.sh (adaptive) fallback +4. **Result validation hook** — validate-result.sh PostToolUse hook with .invalid rename +5. **File conflict detection** — Pre-wave scan in orchestration.md Step 7a.5 +6. **produces_for prompt injection** — Upstream task output injected into dependent task prompts +7. **Retry escalation** — 3-tier: Standard → Context Enrichment → User Escalation +8. **Progress streaming** — Session start, wave start, wave completion summaries +9. **Post-wave merge validation** — OK/WARN/ERROR with auto-repair and force compaction +10. **Bats test suite** — 44 tests across 3 scripts (19 + 14 + 11) + +## Known Issues +- Result file format in orchestration.md (Result File Protocol + 7c prompt template) doesn't match task-executor.md embedded format. Non-blocking: validate-result.sh enforces correct format. +- SKILL.md and orchestration.md step numbering diverge at Step 5/5.5 +- Concurrent edits to orchestration.md caused Edit conflicts in Wave 3a (5 agents editing same file) + +## Files Created/Modified +### New Files +- `claude/sdd-tools/hooks/validate-result.sh` — Result validation hook (~100 lines) +- `claude/sdd-tools/hooks/tests/validate-result.bats` — Hook bats tests (19 tests) +- `claude/sdd-tools/skills/execute-tasks/scripts/watch-for-results.sh` — Event-driven watcher (115 lines) +- `claude/sdd-tools/skills/execute-tasks/scripts/tests/watch-for-results.bats` — Watcher bats tests (11 tests) +- `claude/sdd-tools/skills/execute-tasks/scripts/tests/poll-for-results.bats` — Polling bats tests (14 tests) +- `claude/sdd-tools/tests/fixtures/` — 5 shared bats test fixture files + +### Modified Files +- `claude/sdd-tools/skills/execute-tasks/references/orchestration.md` — ~611 → ~1223 lines +- `claude/sdd-tools/agents/task-executor.md` — 324 → 414 lines +- `claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md` — 318 → 380 lines +- `claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh` — 61 → 133 lines +- `claude/sdd-tools/skills/execute-tasks/SKILL.md` — Updated with hardening features +- `claude/sdd-tools/skills/create-tasks/SKILL.md` — ~653 → ~738 lines (9 phases) +- `claude/sdd-tools/hooks/hooks.json` — Added validate-result.sh entry +- `CLAUDE.md` — Updated with all hardening documentation diff --git a/.claude/sessions/exec-session-20260222-180300/task_log.md b/.claude/sessions/exec-session-20260222-180300/task_log.md new file mode 100644 index 0000000..0579238 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/task_log.md @@ -0,0 +1,20 @@ +# Task Execution Log + +| Task ID | Subject | Status | Attempts | Duration | Token Usage | +|---------|---------|--------|----------|----------|-------------| +| 155 | Define structured context schema and update orchestration.md merge procedures | PASS | 1/3 | 2m 16s | 83,405 | +| 156 | Embed verification and execution rules in task-executor.md | PASS | 1/3 | 3m 53s | 66,262 | +| 159 | Create filesystem watch script (watch-for-results.sh) | PASS | 1/3 | 7m 25s | 59,708 | +| 160 | Implement adaptive polling in poll-for-results.sh | PASS | 1/3 | 10m 31s | 56,448 | +| 157 | Update execution-workflow.md for structured context and embedded rules | PASS | 1/3 | 2m 14s | 68,149 | +| 158 | Create result validation hook (validate-result.sh + hooks.json) | PASS | 1/3 | 4m 47s | 52,753 | +| 161 | Update orchestration.md for event-driven completion detection | PASS | 1/3 | 2m 42s | 74,725 | +| 163 | Add file conflict detection to orchestration and SKILL.md | PASS | 1/3 | 3m 13s | 100,566 | +| 164 | Add produces_for prompt injection logic to orchestration and SKILL.md | PASS | 1/3 | 5m 20s | 108,346 | +| 166 | Add retry escalation logic to orchestration and SKILL.md | PASS | 1/3 | 6m 42s | 147,196 | +| 167 | Add progress streaming to orchestration and SKILL.md | PASS | 1/3 | 2m 28s | 75,143 | +| 168 | Add post-wave merge validation to orchestration.md | PASS | 1/3 | 3m 49s | 101,300 | +| 162 | Write bats tests for shell scripts | PASS | 1/3 | 9m 50s | 61,631 | +| 165 | Update create-tasks skill for produces_for field emission | PASS | 1/3 | 2m 29s | 77,656 | +| 169 | Run end-to-end validation session | PASS | 1/3 | 5m 29s | 89,688 | +| 170 | Update documentation for hardening changes | PASS | 1/3 | 1m 51s | 49,137 | diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/155.json b/.claude/sessions/exec-session-20260222-180300/tasks/155.json new file mode 100644 index 0000000..a04f79b --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/155.json @@ -0,0 +1,21 @@ +{ + "id": "155", + "subject": "Define structured context schema and update orchestration.md merge procedures", + "description": "Define the 6-section structured context schema for `execution_context.md` and per-task `context-task-{id}.md` files. Update `claude/sdd-tools/references/orchestration.md` with section-based merge procedures.\n\n**What to implement:**\n\n1. Define the 6 fixed section headers for execution_context.md:\n - `## Project Setup` — Package manager, runtime, frameworks, build tools\n - `## File Patterns` — Test file patterns, component patterns, API route patterns\n - `## Conventions` — Import style, error handling, state management, naming\n - `## Key Decisions` — Choices made during execution with task references\n - `## Known Issues` — Problems encountered, workarounds, gotchas\n - `## Task History` — Compact log: task ID, name, status, key contribution\n\n2. Define per-task context-task-{id}.md format using same 6 section headers (empty sections omitted)\n\n3. Update orchestration.md merge procedures:\n - Split on `## ` markers as merge anchors\n - Append entries under matching section headers\n - Deduplicate entries within sections during merge\n - Compaction at 10+ entries per section: summarize older entries into paragraph\n - Initial execution_context.md created with all 6 headers and empty content\n\n**Files to modify:**\n- `claude/sdd-tools/references/orchestration.md` (~611 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] execution_context.md template defined with 6 section headers and HTML comments\n- [ ] Per-task context-task-{id}.md format defined with same 6 headers\n- [ ] Orchestrator merge procedure uses section headers as merge anchors\n- [ ] Entries appended under matching section headers during merge\n- [ ] Duplicate entries within a section are deduplicated during merge\n\n_Edge Cases:_\n- [ ] Empty sections in per-task files are omitted by agents (documented as convention)\n- [ ] Content outside any section header is handled gracefully during merge\n- [ ] Merge handles context files with only some sections present\n\n_Error Handling:_\n- [ ] Malformed context file (missing all headers) logged as warning, content placed under `## Key Decisions`\n\n_Performance:_\n- [ ] Compaction triggered at 10+ entries per section to prevent unbounded growth\n\n**Testing Requirements:**\n• Manual: Verify template renders correctly as markdown\n• Manual: Verify merge procedure instructions are clear and unambiguous\n• Integration: Validated during end-to-end execution session (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.7", + "activeForm": "Defining structured context schema", + "status": "completed", + "blocks": [ + "157", + "158" + ], + "blockedBy": [], + "metadata": { + "priority": "high", + "complexity": "M", + "source_section": "5.7 Structured Context Schema", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Structured Context Schema", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:structured-context:schema:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/156.json b/.claude/sessions/exec-session-20260222-180300/tasks/156.json new file mode 100644 index 0000000..0e3962b --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/156.json @@ -0,0 +1,20 @@ +{ + "id": "156", + "subject": "Embed verification and execution rules in task-executor.md", + "description": "Distill essential rules from `execution-workflow.md` and `verification-patterns.md` and embed them directly in the `task-executor.md` agent definition. Remove instructions for agents to explicitly Read reference files at startup.\n\n**What to implement:**\n\n1. Read current `claude/sdd-tools/agents/task-executor.md` (~325 lines)\n2. Read `claude/sdd-tools/references/execution-workflow.md` (~318 lines) to identify essential rules\n3. Read `claude/sdd-tools/references/verification-patterns.md` (~256 lines) to identify essential rules\n4. Distill and embed concise, action-oriented rules covering:\n - 4-phase execution workflow (Understand → Implement → Verify → Complete)\n - Verification classification (spec-based vs general tasks)\n - Result file format (status, task_id, duration, Summary, Files Modified, Context Contribution, Verification)\n - Context contribution format (structured 6-section schema from Task 1)\n - `produces_for` handling (reading upstream task output injected in prompt)\n5. Remove explicit Read instructions for `execution-workflow.md`, `verification-patterns.md`, and `orchestration.md`\n6. Keep `skills: [execute-tasks]` frontmatter for SKILL.md auto-loading\n7. Target: ~425 lines (up from ~325, net +100 lines of embedded rules)\n\n**Files to modify:**\n- `claude/sdd-tools/agents/task-executor.md` (~325 lines)\n\n**Reference files remain** in `references/` directory for documentation purposes but are NOT loaded by agents.\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] task-executor.md contains embedded 4-phase execution workflow rules\n- [ ] task-executor.md contains embedded verification classification rules\n- [ ] task-executor.md contains result file format specification\n- [ ] task-executor.md contains structured context contribution format (6 sections)\n- [ ] Agent no longer instructed to Read execution-workflow.md, verification-patterns.md, or orchestration.md\n- [ ] `skills: [execute-tasks]` frontmatter unchanged (SKILL.md still auto-loads)\n\n_Edge Cases:_\n- [ ] Embedded rules are concise and action-oriented (not verbose explanatory prose)\n- [ ] No redundancy between embedded rules and SKILL.md content\n\n_Performance:_\n- [ ] Agent definition ~425 lines (not exceeding ~450)\n- [ ] Net reduction of ~574 lines of startup file reads per agent\n\n**Testing Requirements:**\n• Manual: Verify agent definition is coherent and complete\n• Manual: Verify reference files are NOT referenced in agent Read instructions\n• Integration: Validated during end-to-end execution session (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.6", + "activeForm": "Embedding agent rules in task-executor.md", + "status": "completed", + "blocks": [ + "157" + ], + "blockedBy": [], + "metadata": { + "priority": "high", + "complexity": "M", + "source_section": "5.6 Embedded Agent Rules", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Embedded Agent Rules", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:embedded-rules:agent:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/157.json b/.claude/sessions/exec-session-20260222-180300/tasks/157.json new file mode 100644 index 0000000..213a719 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/157.json @@ -0,0 +1,25 @@ +{ + "id": "157", + "subject": "Update execution-workflow.md for structured context and embedded rules", + "description": "Update `execution-workflow.md` reference to align with the new structured context schema and embedded agent rules pattern. This file transitions from being agent-loaded to documentation-only.\n\n**What to implement:**\n\n1. Read current `claude/sdd-tools/references/execution-workflow.md` (~318 lines)\n2. Update Phase 1 (context reading) to describe structured 6-section schema\n3. Update Phase 4 (context writing) to describe writing entries under appropriate structured sections\n4. Add note that this file is now documentation-only (not loaded by agents at startup)\n5. Ensure consistency with embedded rules in task-executor.md (Task 2)\n6. Ensure consistency with structured context schema (Task 1)\n\n**Files to modify:**\n- `claude/sdd-tools/references/execution-workflow.md` (~318 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Phase 1 references structured 6-section context reading\n- [ ] Phase 4 references structured context writing (entries under section headers, empty sections omitted)\n- [ ] File header notes this is documentation-only, not agent-loaded\n- [ ] Content is consistent with task-executor.md embedded rules\n- [ ] Content is consistent with orchestration.md structured context merge procedures\n\n_Edge Cases:_\n- [ ] No stale references to old free-form context writing pattern\n\n**Testing Requirements:**\n• Manual: Verify consistency across execution-workflow.md, task-executor.md, and orchestration.md\n• Integration: Validated during end-to-end execution session (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Sections 5.6, 5.7", + "activeForm": "Updating execution-workflow.md", + "status": "completed", + "blocks": [ + "163", + "166", + "168" + ], + "blockedBy": [ + "155", + "156" + ], + "metadata": { + "priority": "high", + "complexity": "S", + "source_section": "5.6/5.7", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Execution Workflow Update", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:execution-workflow:reference:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/158.json b/.claude/sessions/exec-session-20260222-180300/tasks/158.json new file mode 100644 index 0000000..16160c1 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/158.json @@ -0,0 +1,23 @@ +{ + "id": "158", + "subject": "Create result validation hook (validate-result.sh + hooks.json)", + "description": "Create a PostToolUse hook that validates result files written by task-executor agents. Register the hook in hooks.json.\n\n**What to implement:**\n\n1. Create `claude/sdd-tools/hooks/validate-result.sh` (~50 lines):\n - Trigger on Write operations targeting `result-task-*.md` in session directory\n - Filter: only act when file path matches `*/result-task-*.md`\n - Validate first line matches `status: (PASS|PARTIAL|FAIL)`\n - Validate required sections present: `## Summary`, `## Files Modified`, `## Context Contribution`\n - Validate corresponding `context-task-{id}.md` exists (write-ordering invariant)\n - If context file missing: create stub `### Task [{id}]: No learnings captured`\n - If invalid: rename to `result-task-{id}.md.invalid` with error description appended\n - NEVER exit non-zero (defensive: trap on ERR, fall through on any error)\n - Debug logging to stderr via `AGENT_ALCHEMY_HOOK_DEBUG=1` env var\n - Follow patterns from existing `auto-approve-session.sh`\n\n2. Update `claude/sdd-tools/hooks/hooks.json`:\n - Add PostToolUse hook entry for validate-result.sh\n - Pattern: `Write`\n - Command: `${CLAUDE_PLUGIN_ROOT}/hooks/validate-result.sh`\n - Timeout: 5000ms\n\n**Reference for hook patterns:**\n- Read `claude/sdd-tools/hooks/auto-approve-session.sh` (~75 lines) as model\n\n**Files to create:**\n- `claude/sdd-tools/hooks/validate-result.sh` (new, ~50 lines)\n\n**Files to modify:**\n- `claude/sdd-tools/hooks/hooks.json` (~20 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Hook triggers on Write to result-task-*.md files\n- [ ] Valid result files (PASS/PARTIAL/FAIL status, all sections) are accepted unchanged\n- [ ] Missing status line causes file to be renamed to .invalid\n- [ ] Invalid status value causes file to be renamed to .invalid\n- [ ] Missing required section (Summary, Files Modified, Context Contribution) causes .invalid rename\n- [ ] Missing context-task-{id}.md triggers stub creation, result file still accepted\n- [ ] Hook registered in hooks.json with PostToolUse event, Write pattern, 5s timeout\n\n_Edge Cases:_\n- [ ] Non-session file writes (unrelated paths) are ignored by hook\n- [ ] Result files with >25 lines are accepted (first sections validated, extra content allowed with warning)\n- [ ] Agent writes result before context: stub created, result accepted\n\n_Error Handling:_\n- [ ] Hook NEVER exits non-zero (trap on ERR falls through)\n- [ ] Malformed input to hook is caught by trap, exits 0\n- [ ] Debug logging only when AGENT_ALCHEMY_HOOK_DEBUG=1\n\n_Performance:_\n- [ ] Hook execution < 100ms per invocation\n\n**Testing Requirements:**\n• Shell (bats): Valid PASS/FAIL/PARTIAL result accepted\n• Shell (bats): Missing status line → .invalid rename\n• Shell (bats): Invalid status value → .invalid rename\n• Shell (bats): Missing section → .invalid rename\n• Shell (bats): Missing context file → stub created\n• Shell (bats): Non-session file → ignored\n• Shell (bats): Hook error → trap catches, exit 0\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.2", + "activeForm": "Creating result validation hook", + "status": "completed", + "blocks": [ + "162", + "164" + ], + "blockedBy": [ + "155" + ], + "metadata": { + "priority": "critical", + "complexity": "M", + "source_section": "5.2 Result File Validation Hook", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Result File Validation Hook", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:result-validation:hook:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/159.json b/.claude/sessions/exec-session-20260222-180300/tasks/159.json new file mode 100644 index 0000000..b835230 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/159.json @@ -0,0 +1,21 @@ +{ + "id": "159", + "subject": "Create filesystem watch script (watch-for-results.sh)", + "description": "Create a new shell script that uses filesystem events (`fswatch` on macOS, `inotifywait` on Linux) to detect result files immediately when agents complete, replacing the fixed-interval polling as the primary completion mechanism.\n\n**What to implement:**\n\n1. Create `claude/sdd-tools/scripts/watch-for-results.sh` (~60 lines):\n - Usage: `watch-for-results.sh [task_ids...]`\n - Check tool availability at startup: `command -v fswatch` / `command -v inotifywait`\n - If neither available: exit with code 2 (not-available signal)\n - Detect pre-existing result files before starting watch (count toward expected)\n - Watch for `Created` events only (not Modified) to avoid duplicates\n - On macOS: use `fswatch --event Created`\n - On Linux: use `inotifywait -m -e create`\n - Output one line per detection: `RESULT_FOUND: result-task-{id}.md ({found}/{expected})`\n - Output `ALL_DONE` when all expected result files found\n - Configurable timeout via `WATCH_TIMEOUT` env var (default: 2700 seconds = 45 min)\n - Exit codes: 0 (all found), 1 (timeout), 2 (tools unavailable)\n\n**Files to create:**\n- `claude/sdd-tools/scripts/watch-for-results.sh` (new, ~60 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Script detects result files within 1 second of creation (zero-latency)\n- [ ] Emits `RESULT_FOUND: result-task-{id}.md (N/M)` for each detection\n- [ ] Emits `ALL_DONE` when expected count reached\n- [ ] Exits 0 when all results found\n- [ ] Pre-existing result files counted before watch starts\n- [ ] Accepts session_dir, expected_count, and optional task_ids as arguments\n\n_Edge Cases:_\n- [ ] Handles result files already present at watch start\n- [ ] Ignores non-result files created in session directory\n- [ ] Agent creates temp file then renames → watch triggers on final filename only\n- [ ] Partial completion at timeout → reports found results, exits code 1\n\n_Error Handling:_\n- [ ] Neither fswatch nor inotifywait available → exit code 2\n- [ ] fswatch exits unexpectedly → script exits code 1 (orchestrator falls back to polling)\n- [ ] Timeout reached → exit code 1\n\n_Performance:_\n- [ ] Detection latency < 1 second from file creation\n\n**Testing Requirements:**\n• Shell (bats): All results found → ALL_DONE output, exit 0\n• Shell (bats): Timeout with no files → exit 1\n• Shell (bats): No fswatch available → exit 2\n• Shell (bats): Pre-existing results detected and counted\n• Shell (bats): Partial completion → reports found, exits 1\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.4", + "activeForm": "Creating filesystem watch script", + "status": "completed", + "blocks": [ + "161", + "162" + ], + "blockedBy": [], + "metadata": { + "priority": "high", + "complexity": "M", + "source_section": "5.4 Event-Driven Completion Detection", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Event-Driven Completion Detection", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:event-completion:script:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/160.json b/.claude/sessions/exec-session-20260222-180300/tasks/160.json new file mode 100644 index 0000000..c164592 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/160.json @@ -0,0 +1,21 @@ +{ + "id": "160", + "subject": "Implement adaptive polling in poll-for-results.sh", + "description": "Modify the existing `poll-for-results.sh` script to use adaptive polling intervals instead of fixed 15-second intervals. This serves as the fallback when `watch-for-results.sh` reports tools unavailable (exit code 2).\n\n**What to implement:**\n\n1. Read current `claude/sdd-tools/scripts/poll-for-results.sh` (~61 lines)\n2. Modify polling logic:\n - Start at 5-second interval (configurable via `POLL_START_INTERVAL`, default: 5)\n - Increase by 5 seconds after each poll round with no new results\n - Cap at 30 seconds (configurable via `POLL_MAX_INTERVAL`, default: 30)\n - Reset to start interval when a new result is found\n - Cumulative timeout via `POLL_TIMEOUT` env var (default: 2700 = 45 min)\n - Interval progression: 5s → 10s → 15s → 20s → 25s → 30s → 30s → ...\n3. Maintain same output format as watch-for-results.sh:\n - `RESULT_FOUND: result-task-{id}.md (N/M)`\n - `ALL_DONE`\n4. Maintain same exit codes: 0 (all found), 1 (timeout)\n5. Maintain same argument interface: ` [task_ids...]`\n\n**Files to modify:**\n- `claude/sdd-tools/scripts/poll-for-results.sh` (~61 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Polling starts at 5-second intervals (or POLL_START_INTERVAL)\n- [ ] Interval increases by 5s after each poll with no new results\n- [ ] Maximum interval caps at 30s (or POLL_MAX_INTERVAL)\n- [ ] Interval resets to start value when new result found\n- [ ] Cumulative timeout at 45 min (or POLL_TIMEOUT)\n- [ ] Output format matches watch-for-results.sh (RESULT_FOUND / ALL_DONE)\n\n_Edge Cases:_\n- [ ] POLL_START_INTERVAL=10 correctly starts at 10s\n- [ ] POLL_MAX_INTERVAL=15 correctly caps at 15s\n- [ ] All results found on first poll → immediate ALL_DONE\n\n_Error Handling:_\n- [ ] Timeout → exit code 1\n- [ ] Invalid environment variable values → use defaults\n\n**Testing Requirements:**\n• Shell (bats): Adaptive interval increase (5s, 10s, 15s, 20s, 25s, 30s)\n• Shell (bats): Interval reset on new result found\n• Shell (bats): Max interval cap at 30s\n• Shell (bats): Environment variable override (POLL_START_INTERVAL=10)\n• Shell (bats): Timeout → exit 1\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.8", + "activeForm": "Implementing adaptive polling", + "status": "completed", + "blocks": [ + "161", + "162" + ], + "blockedBy": [], + "metadata": { + "priority": "medium", + "complexity": "S", + "source_section": "5.8 Adaptive Polling (Fallback)", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Adaptive Polling", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:adaptive-polling:script:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/161.json b/.claude/sessions/exec-session-20260222-180300/tasks/161.json new file mode 100644 index 0000000..c55c7f0 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/161.json @@ -0,0 +1,23 @@ +{ + "id": "161", + "subject": "Update orchestration.md for event-driven completion detection", + "description": "Update the orchestration reference to replace fixed polling with event-driven completion detection using `watch-for-results.sh` as primary and `poll-for-results.sh` (adaptive) as fallback.\n\n**What to implement:**\n\n1. Read current `claude/sdd-tools/references/orchestration.md` (~611 lines)\n2. Update completion detection procedures to implement watch → poll fallback:\n - Primary: Launch `watch-for-results.sh` with session dir, expected count, task IDs\n - If exit code 0: all results found, proceed to processing\n - If exit code 2: tools unavailable, fall back to `poll-for-results.sh` (adaptive)\n - If exit code 1: timeout, handle as wave timeout\n - If watch process exits unexpectedly: fall back to polling\n3. Update orchestrator instructions for reading watch/poll output:\n - Parse `RESULT_FOUND:` lines for incremental detection\n - Wait for `ALL_DONE` or timeout\n4. Ensure compatibility with existing result processing logic\n\n**Files to modify:**\n- `claude/sdd-tools/references/orchestration.md` (~611 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Orchestrator launches watch-for-results.sh as primary completion detection\n- [ ] Fallback to poll-for-results.sh when watch exits with code 2\n- [ ] Timeout handling (exit code 1) triggers wave timeout procedures\n- [ ] Incremental result detection via RESULT_FOUND parsing\n- [ ] Completion via ALL_DONE signal\n\n_Edge Cases:_\n- [ ] Watch process killed mid-execution → fallback to polling\n- [ ] All results pre-exist before watch starts → immediate ALL_DONE\n- [ ] Mixed detection: some results found by watch, then watch fails, polling finds rest\n\n_Error Handling:_\n- [ ] Unexpected watch script behavior → log warning, fall back to polling\n- [ ] Polling also times out → wave timeout handling\n\n**Testing Requirements:**\n• Manual: Verify orchestration.md instructions are clear and unambiguous\n• Integration: Validated during end-to-end execution session (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.4", + "activeForm": "Updating orchestration.md for completion detection", + "status": "completed", + "blocks": [ + "167" + ], + "blockedBy": [ + "159", + "160" + ], + "metadata": { + "priority": "high", + "complexity": "S", + "source_section": "5.4 Event-Driven Completion Detection", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Event-Driven Completion Detection", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:event-completion:integration:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/162.json b/.claude/sessions/exec-session-20260222-180300/tasks/162.json new file mode 100644 index 0000000..2880ced --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/162.json @@ -0,0 +1,22 @@ +{ + "id": "162", + "subject": "Write bats tests for shell scripts", + "description": "Write comprehensive bats (Bash Automated Testing System) tests for all three shell scripts: `validate-result.sh`, `watch-for-results.sh`, and `poll-for-results.sh`.\n\n**What to implement:**\n\n1. Ensure bats is available (`brew install bats-core` if needed)\n2. Create test directory structure under `claude/sdd-tools/tests/`\n3. Write test fixtures (valid/invalid result files, context files)\n4. Write test suites:\n\n**validate-result.sh tests (8 scenarios):**\n- Valid PASS result → file preserved\n- Valid FAIL result → file preserved\n- Missing status line → file renamed to .invalid\n- Invalid status value (status: UNKNOWN) → .invalid\n- Missing required section (no ## Summary) → .invalid\n- Missing context file → stub context created, result accepted\n- Non-session file write → hook ignores\n- Hook error (malformed input) → trap catches, exit 0\n\n**watch-for-results.sh tests (5 scenarios):**\n- All results found → ALL_DONE output, exit 0\n- Timeout with no files → exit 1\n- No fswatch available → exit 2\n- Pre-existing results → detected and counted\n- Partial completion → reports found, exits 1\n\n**poll-for-results.sh tests (5 scenarios):**\n- Adaptive interval increase → 5s, 10s, 15s, 20s, 25s, 30s progression\n- Interval reset on new result → resets to 5s\n- Max interval cap → never exceeds 30s\n- Environment variable override → POLL_START_INTERVAL=10 starts at 10s\n- Timeout → exit 1\n\n**Files to create:**\n- `claude/sdd-tools/tests/validate-result.bats` (new)\n- `claude/sdd-tools/tests/watch-for-results.bats` (new)\n- `claude/sdd-tools/tests/poll-for-results.bats` (new)\n- `claude/sdd-tools/tests/fixtures/` — test fixture files (new directory)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] All 8 validate-result.sh test scenarios pass\n- [ ] All 5 watch-for-results.sh test scenarios pass\n- [ ] All 5 poll-for-results.sh test scenarios pass\n- [ ] Test fixtures include valid and invalid result files\n- [ ] Tests use temp directories for isolation (cleanup in teardown)\n\n_Edge Cases:_\n- [ ] Tests work on macOS (zsh/bash environment)\n- [ ] Tests mock fswatch/inotifywait availability for watch-for-results.sh tests\n- [ ] Tests use short timeouts for fast execution\n\n_Error Handling:_\n- [ ] Each test cleans up temp files in teardown\n- [ ] Failed tests report clear error messages\n\n**Testing Requirements:**\n• Self-validating: `bats claude/sdd-tools/tests/` runs all tests\n• All 18 test scenarios pass\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 10", + "activeForm": "Writing bats tests for shell scripts", + "status": "completed", + "blocks": [], + "blockedBy": [ + "158", + "159", + "160" + ], + "metadata": { + "priority": "high", + "complexity": "L", + "source_section": "10 Testing Strategy", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Shell Script Tests", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:testing:bats:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/163.json b/.claude/sessions/exec-session-20260222-180300/tasks/163.json new file mode 100644 index 0000000..48de395 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/163.json @@ -0,0 +1,22 @@ +{ + "id": "163", + "subject": "Add file conflict detection to orchestration and SKILL.md", + "description": "Add pre-wave file conflict detection to prevent concurrent agents from editing the same files within a wave. Implement in both orchestration.md (detailed procedure) and SKILL.md (workflow step).\n\n**What to implement:**\n\n1. Add conflict detection procedure to `claude/sdd-tools/references/orchestration.md`:\n - Pre-wave scan: parse all wave tasks' `description` and `acceptance_criteria` for file path references\n - File path detection patterns:\n - Paths containing `/` (e.g., `src/api/handler.ts`)\n - Paths ending in known extensions: `.md`, `.ts`, `.js`, `.json`, `.sh`, `.py`\n - Glob patterns (e.g., `src/api/*.ts`)\n - Conflict flagging: two or more tasks reference the same file path\n - Resolution: lower ID task stays, higher ID tasks deferred to next wave (artificial dependency inserted)\n - Overlapping glob patterns treated as conflicts\n - Log results in `execution_plan.md` under \"Conflict Resolution\" section\n\n2. Update `claude/sdd-tools/skills/execute-tasks/SKILL.md` (~271 lines):\n - Add conflict scan step to wave planning workflow (between dependency sort and wave launch)\n - Reference orchestration.md procedure for details\n\n**Files to modify:**\n- `claude/sdd-tools/references/orchestration.md` (~611 lines)\n- `claude/sdd-tools/skills/execute-tasks/SKILL.md` (~271 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Pre-wave scan detects file path references in task descriptions and acceptance criteria\n- [ ] Paths with `/`, known extensions, and glob patterns are detected\n- [ ] Two tasks referencing same file → conflict flagged\n- [ ] Lower ID task stays in wave, higher ID deferred\n- [ ] Conflict resolution logged in execution_plan.md\n- [ ] SKILL.md includes conflict scan step in workflow\n- [ ] No conflicts → wave proceeds unchanged with no overhead\n\n_Edge Cases:_\n- [ ] No file paths in descriptions → no conflicts detected\n- [ ] All tasks conflict on same file → sequentialized (one per sub-wave)\n- [ ] Glob patterns overlap → treated as conflict\n- [ ] File path only in acceptance criteria (not description) → still detected\n\n_Error Handling:_\n- [ ] Path pattern regex fails → log warning, proceed without detection for this wave\n\n**Testing Requirements:**\n• Manual: Verify conflict detection patterns match spec examples\n• Integration: Validated during end-to-end execution session (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.1", + "activeForm": "Adding file conflict detection", + "status": "completed", + "blocks": [ + "169" + ], + "blockedBy": [ + "157" + ], + "metadata": { + "priority": "critical", + "complexity": "M", + "source_section": "5.1 File Conflict Detection", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "File Conflict Detection", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:conflict-detection:orchestration:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/164.json b/.claude/sessions/exec-session-20260222-180300/tasks/164.json new file mode 100644 index 0000000..34de4dc --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/164.json @@ -0,0 +1,23 @@ +{ + "id": "164", + "subject": "Add produces_for prompt injection logic to orchestration and SKILL.md", + "description": "Implement the `produces_for` mechanism that injects producer task results directly into dependent task prompts for richer context than wave-granular merging alone.\n\n**What to implement:**\n\n1. Define `produces_for` field in task JSON schema (in orchestration.md):\n - Array of task IDs that consume this task's output\n - Optional field: tasks without it use wave-granular context only\n\n2. Add injection logic to `claude/sdd-tools/references/orchestration.md`:\n - When launching a dependent task, check if any of its blockedBy tasks have `produces_for` pointing to it\n - Read the producer's result file content\n - Inject into task prompt: `## UPSTREAM TASK OUTPUT (Task #{id}: {name})\\n{result file content}\\n---`\n - Multiple producers injected in task ID order\n - Injection after execution context loading, before codebase exploration\n - If producer result file missing (task failed): inject `## UPSTREAM TASK #{id} FAILED\\n{failure summary from task_log.md}`\n\n3. Update `claude/sdd-tools/skills/execute-tasks/SKILL.md`:\n - Add prompt injection step to agent launch procedure\n - Reference orchestration.md for detailed injection format\n\n**Task JSON schema extension:**\n```json\n{\n \"produces_for\": [\"8\", \"12\"]\n}\n```\n\n**Pattern reference:** execute-tdd-tasks' `PAIRED TEST TASK OUTPUT` mechanism.\n\n**Files to modify:**\n- `claude/sdd-tools/references/orchestration.md` (~611 lines)\n- `claude/sdd-tools/skills/execute-tasks/SKILL.md` (~271 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] `produces_for` field defined in task JSON schema documentation\n- [ ] Producer result files injected into dependent task prompts\n- [ ] Injection format: `## UPSTREAM TASK OUTPUT (Task #{id}: {name})\\n{content}\\n---`\n- [ ] Multiple producers injected in task ID order\n- [ ] Injection happens after context loading, before exploration\n- [ ] Tasks without produces_for behave unchanged (wave-granular context only)\n\n_Edge Cases:_\n- [ ] Producer task failed → inject failure notice instead of result\n- [ ] Producer result file missing entirely → inject failure notice\n- [ ] No produces_for relationships in task set → no injection overhead\n\n_Error Handling:_\n- [ ] Missing producer result file → log warning, inject failure notice, continue\n\n**Testing Requirements:**\n• Manual: Verify injection format in orchestration.md\n• Integration: Validated during end-to-end execution session (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.5", + "activeForm": "Adding prompt injection logic", + "status": "completed", + "blocks": [ + "165", + "169" + ], + "blockedBy": [ + "158" + ], + "metadata": { + "priority": "high", + "complexity": "M", + "source_section": "5.5 Generalized Prompt Injection", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Prompt Injection", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:prompt-injection:orchestration:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/165.json b/.claude/sessions/exec-session-20260222-180300/tasks/165.json new file mode 100644 index 0000000..0ef3050 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/165.json @@ -0,0 +1,22 @@ +{ + "id": "165", + "subject": "Update create-tasks skill for produces_for field emission", + "description": "Update the `create-tasks` skill to detect producer-consumer relationships between tasks and emit the `produces_for` field in generated task JSON.\n\n**What to implement:**\n\n1. Read current `claude/sdd-tools/skills/create-tasks/SKILL.md` (~653 lines)\n2. Add `produces_for` detection logic:\n - When decomposing tasks, identify relationships where one task's output is directly consumed by another\n - Common patterns:\n - Data model tasks → API tasks that use the model\n - Schema definition tasks → Implementation tasks that implement the schema\n - Configuration tasks → Tasks that consume the configuration\n - Foundation tasks → Tasks that build on the foundation\n - Detect from spec's `blockedBy` relationships: if task B is blocked by task A AND task A's deliverable is directly referenced in task B's description, add `produces_for: [B_id]` to task A\n3. Emit `produces_for` field in task JSON (array of task IDs)\n4. Update the task structure documentation in SKILL.md to include `produces_for`\n5. This is a clean break: new task JSON format (old format not supported)\n\n**Files to modify:**\n- `claude/sdd-tools/skills/create-tasks/SKILL.md` (~653 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] create-tasks detects producer-consumer relationships\n- [ ] `produces_for` field emitted in task JSON for identified relationships\n- [ ] `produces_for` is optional — tasks without relationships omit the field\n- [ ] Task structure documentation updated with produces_for field\n- [ ] Detection covers: model→API, schema→impl, config→consumer, foundation→builder patterns\n\n_Edge Cases:_\n- [ ] No producer-consumer relationships detected → no produces_for fields emitted\n- [ ] Task produces for multiple consumers → array contains all consumer IDs\n- [ ] Circular production relationships → skip (dependency inference already prevents circular blockedBy)\n\n_Error Handling:_\n- [ ] Uncertain relationships → omit produces_for (conservative approach)\n\n**Testing Requirements:**\n• Manual: Verify detection patterns cover common relationships\n• Integration: Run create-tasks on a test spec and verify produces_for output\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.5", + "activeForm": "Updating create-tasks for produces_for", + "status": "completed", + "blocks": [ + "169" + ], + "blockedBy": [ + "164" + ], + "metadata": { + "priority": "high", + "complexity": "M", + "source_section": "5.5 Generalized Prompt Injection", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Prompt Injection (create-tasks)", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:prompt-injection:create-tasks:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/166.json b/.claude/sessions/exec-session-20260222-180300/tasks/166.json new file mode 100644 index 0000000..9397b05 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/166.json @@ -0,0 +1,22 @@ +{ + "id": "166", + "subject": "Add retry escalation logic to orchestration and SKILL.md", + "description": "Implement a 3-tier retry escalation strategy for failed tasks that progressively provides more help rather than repeating the same approach.\n\n**What to implement:**\n\n1. Add retry escalation procedures to `claude/sdd-tools/references/orchestration.md`:\n - **Retry #1 (Standard)**: Re-launch agent with failure context from previous attempt (existing behavior, enhanced)\n - **Retry #2 (Context Enrichment)**: Inject full `execution_context.md` content + result files from related tasks (same wave or shared dependencies) into retry prompt\n - **Retry #3 (User Escalation)**: Pause execution, present failure details to user via AskUserQuestion with 4 options:\n - \"Fix manually and continue\" — user fixes externally, execution resumes\n - \"Skip this task\" — mark as FAIL in task_log.md, continue\n - \"Provide guidance\" — capture user text, inject into final retry\n - \"Abort session\" — clean up, present partial summary\n - If user provides guidance: final retry with user guidance injected\n - If that also fails: present AskUserQuestion again with updated failure\n - Track retry escalation level in task_log.md per task\n - Retry count resets for each new task\n\n2. Update `claude/sdd-tools/skills/execute-tasks/SKILL.md`:\n - Add retry escalation step to failure handling workflow\n - Reference orchestration.md for detailed escalation procedures\n\n**Files to modify:**\n- `claude/sdd-tools/references/orchestration.md` (~611 lines)\n- `claude/sdd-tools/skills/execute-tasks/SKILL.md` (~271 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Retry #1: Standard retry with failure context\n- [ ] Retry #2: Context enrichment with full execution_context.md + related results\n- [ ] Retry #3: User escalation via AskUserQuestion with 4 options\n- [ ] \"Fix manually and continue\" pauses and resumes\n- [ ] \"Skip this task\" marks FAIL in task_log.md, continues\n- [ ] \"Provide guidance\" captures text, injects into retry\n- [ ] \"Abort session\" cleans up, shows partial summary\n- [ ] Retry level tracked in task_log.md per task\n- [ ] Retry count resets per task (not cumulative)\n\n_Edge Cases:_\n- [ ] User provides guidance but retry still fails → re-present AskUserQuestion\n- [ ] Multiple tasks fail in same wave → each gets independent escalation\n- [ ] Retry #2 succeeds → task passes, no user escalation needed\n\n_Error Handling:_\n- [ ] All automated retries exhausted → user must choose action\n- [ ] User selects abort → graceful session cleanup\n\n**Testing Requirements:**\n• Manual: Verify escalation logic in orchestration.md is clear\n• Integration: Trigger retry with intentionally failing task during e2e (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.9", + "activeForm": "Adding retry escalation logic", + "status": "completed", + "blocks": [ + "169" + ], + "blockedBy": [ + "157" + ], + "metadata": { + "priority": "medium", + "complexity": "M", + "source_section": "5.9 Retry Escalation", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Retry Escalation", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:retry-escalation:orchestration:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/167.json b/.claude/sessions/exec-session-20260222-180300/tasks/167.json new file mode 100644 index 0000000..a6afe8f --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/167.json @@ -0,0 +1,22 @@ +{ + "id": "167", + "subject": "Add progress streaming to orchestration and SKILL.md", + "description": "Add wave completion summaries that are emitted as text output visible to the human operator during execution, eliminating the current ~50 minutes of silence.\n\n**What to implement:**\n\n1. Add progress streaming procedures to `claude/sdd-tools/references/orchestration.md`:\n - After each wave completes, emit structured human-readable summary:\n ```\n Wave 2/6 complete: 3/3 tasks passed (2m 34s)\n [3] Create test-writer agent — PASS (1m 52s, 48K tokens)\n [5] Create tdd-workflow reference — PASS (2m 22s, 54K tokens)\n [7] Create test patterns reference — PASS (2m 34s, 61K tokens)\n ```\n - Before starting next wave: `Starting Wave {N}/{total}: {count} tasks...`\n - On session start: `Execution plan: {total_tasks} tasks across {total_waves} waves (max {max_parallel} parallel)`\n - Data sourced from result files and TaskOutput reaping (already collected)\n - Wave-level granularity only (no per-task streaming during a wave)\n\n2. Update `claude/sdd-tools/skills/execute-tasks/SKILL.md`:\n - Add progress output steps between wave processing\n - Include session start summary in initial setup\n\n**Files to modify:**\n- `claude/sdd-tools/references/orchestration.md` (~611 lines)\n- `claude/sdd-tools/skills/execute-tasks/SKILL.md` (~271 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Wave completion summary emitted after each wave (wave N/total, pass/fail count, duration)\n- [ ] Per-task breakdown in summary: task ID, name, status, duration, token count\n- [ ] \"Starting Wave N/total: count tasks...\" emitted before each wave\n- [ ] Session start message: \"Execution plan: X tasks across Y waves (max Z parallel)\"\n- [ ] Summaries are structured but human-readable (not JSON)\n\n_Edge Cases:_\n- [ ] Wave with failures includes FAIL/PARTIAL status per task\n- [ ] Single-wave session still shows summary\n- [ ] Token count unavailable → omit from per-task line\n\n_Performance:_\n- [ ] No additional file I/O required (data from existing result processing)\n\n**Testing Requirements:**\n• Manual: Verify summary format matches spec example\n• Integration: Confirm summaries visible during end-to-end execution (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.3", + "activeForm": "Adding progress streaming", + "status": "completed", + "blocks": [ + "169" + ], + "blockedBy": [ + "161" + ], + "metadata": { + "priority": "critical", + "complexity": "S", + "source_section": "5.3 Progress Streaming", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Progress Streaming", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:progress-streaming:orchestration:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/168.json b/.claude/sessions/exec-session-20260222-180300/tasks/168.json new file mode 100644 index 0000000..5ced9e5 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/168.json @@ -0,0 +1,22 @@ +{ + "id": "168", + "subject": "Add post-wave merge validation to orchestration.md", + "description": "Add validation of `execution_context.md` after each context merge to catch corruption or unbounded growth early.\n\n**What to implement:**\n\n1. Add merge validation procedure to `claude/sdd-tools/references/orchestration.md`:\n - After merging per-task context files, validate:\n a. All 6 section headers present (`## Project Setup`, `## File Patterns`, `## Conventions`, `## Key Decisions`, `## Known Issues`, `## Task History`)\n b. Total file size: warn if >500 lines, error if >1000 lines\n c. No content outside of any section header (malformed sections)\n - If validation fails:\n - Log warning in task_log.md\n - Attempt auto-repair: re-insert missing headers\n - If size exceeds 1000 lines:\n - Force compaction of all sections before proceeding\n - Include validation results in wave completion summary (progress streaming)\n\n2. Runs after context merge, before next wave launch\n3. Leverages structured schema (Task 1) for reliable header detection\n\n**Files to modify:**\n- `claude/sdd-tools/references/orchestration.md` (~611 lines)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] All 6 section headers validated after each merge\n- [ ] Warn if >500 lines\n- [ ] Error if >1000 lines → force compaction\n- [ ] Missing headers → auto-repair (re-insert)\n- [ ] Content outside section headers → flagged as malformed\n- [ ] Validation results included in wave completion summary\n\n_Edge Cases:_\n- [ ] All headers present, size normal → no action needed\n- [ ] One header missing after merge → auto-repair re-inserts it\n- [ ] Compaction triggered → older entries summarized per-section\n\n_Error Handling:_\n- [ ] Auto-repair fails → log error in task_log.md, continue with best-effort context\n\n**Testing Requirements:**\n• Manual: Verify validation logic is clear in orchestration.md\n• Integration: Validate during end-to-end execution (Task 15)\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 5.10", + "activeForm": "Adding post-wave merge validation", + "status": "completed", + "blocks": [ + "169" + ], + "blockedBy": [ + "157" + ], + "metadata": { + "priority": "medium", + "complexity": "S", + "source_section": "5.10 Post-Wave Merge Validation", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "Post-Wave Merge Validation", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:merge-validation:orchestration:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/.claude/sessions/exec-session-20260222-180300/tasks/169.json b/.claude/sessions/exec-session-20260222-180300/tasks/169.json new file mode 100644 index 0000000..2a4bdd8 --- /dev/null +++ b/.claude/sessions/exec-session-20260222-180300/tasks/169.json @@ -0,0 +1,27 @@ +{ + "id": "169", + "subject": "Run end-to-end validation session", + "description": "Execute a real task session with all hardening features active to validate the complete pipeline works end-to-end.\n\n**What to implement:**\n\n1. Generate tasks from an existing spec using `/create-tasks` (with `produces_for` field)\n2. Run `/execute-tasks` with all hardening features active\n3. Verify each feature works in integration:\n - **File conflict detection**: Confirm conflicts logged in execution_plan.md, tasks rearranged\n - **Result validation hook**: Confirm validate-result.sh fires on result file writes, invalid files caught\n - **Progress streaming**: Confirm wave completion summaries visible to user\n - **Event-driven completion**: Confirm fswatch used (or polling fallback on systems without fswatch)\n - **Structured context**: Confirm execution_context.md has 6-section structure after merge\n - **Embedded agent rules**: Confirm agents don't read reference files explicitly\n - **Prompt injection**: Confirm produces_for results injected into dependent task prompts\n - **Adaptive polling**: Confirm fallback polling uses progressive intervals (if triggered)\n - **Retry escalation**: Trigger with intentionally failing task if feasible, or verify logic paths\n - **Merge validation**: Confirm post-merge validation runs without errors\n4. Document results: pass/fail per feature, any issues discovered\n\n**Files to read/verify (not modify):**\n- Session directory files (execution_plan.md, execution_context.md, task_log.md, result files, context files)\n- Hook debug logs (if AGENT_ALCHEMY_HOOK_DEBUG=1)\n\n**Acceptance Criteria:**\n\n_Functional:_\n- [ ] Full execution session completes successfully\n- [ ] All 10 hardening features verified as operational\n- [ ] No regressions in existing behavior (wave execution, result processing, context merging)\n- [ ] Session summary generated at end\n\n_Edge Cases:_\n- [ ] At least one wave with multiple parallel tasks to test conflict detection\n- [ ] At least one produces_for relationship to test prompt injection\n\n_Error Handling:_\n- [ ] Any feature failure → document issue, note for follow-up fix\n\n**Testing Requirements:**\n• Integration: This IS the integration test\n• Coverage: All 10 features validated in a single session\n\nSource: internal/specs/execute-tasks-hardening-SPEC.md Section 10.3", + "activeForm": "Running end-to-end validation", + "status": "completed", + "blocks": [ + "170" + ], + "blockedBy": [ + "163", + "164", + "165", + "166", + "167", + "168" + ], + "metadata": { + "priority": "high", + "complexity": "L", + "source_section": "10.3 Integration Validation", + "spec_path": "internal/specs/execute-tasks-hardening-SPEC.md", + "feature_name": "End-to-End Validation", + "task_uid": "internal/specs/execute-tasks-hardening-SPEC.md:integration:e2e:001", + "task_group": "execute-tasks-hardening" + } +} \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index 60ce5ae..cff749d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,3 +18,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/). - Bump dev-tools from 0.2.0 to 0.3.0 - Bump core-tools from 0.2.0 to 0.2.1 and dev-tools from 0.3.0 to 0.3.1 - Bump sdd-tools from 0.2.0 to 0.2.1 +- Bump sdd-tools from 0.2.1 to 0.3.0 +- Bump sdd-tools from 0.3.0 to 0.3.1 diff --git a/CLAUDE.md b/CLAUDE.md index 149484e..537fc13 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -12,7 +12,7 @@ agent-alchemy/ │ ├── .claude-plugin/ # Plugin marketplace registry │ ├── core-tools/ # Codebase analysis, deep exploration, language patterns (includes hooks/) │ ├── dev-tools/ # Feature dev, debugging, code review, docs, changelog -│ ├── sdd-tools/ # Spec-Driven Development pipeline +│ ├── sdd-tools/ # Spec-Driven Development pipeline (includes hooks/, tests/fixtures/, scripts/) │ ├── tdd-tools/ # TDD workflows: test generation, RED-GREEN-REFACTOR, coverage │ ├── git-tools/ # Git commit automation │ └── plugin-tools/ # Plugin porting, adapter validation, ported plugin maintenance, ecosystem health @@ -48,7 +48,7 @@ pnpm lint # Lint all packages - **Skills** are defined in `SKILL.md` with YAML frontmatter and markdown body - **Agents** are defined in `{name}.md` with YAML frontmatter (model, tools, skills) -- **Hooks** are JSON configs in `hooks/hooks.json` for lifecycle events +- **Hooks** are JSON configs in `hooks/hooks.json` for lifecycle events (PreToolUse, PostToolUse); sdd-tools includes `auto-approve-session.sh` (PreToolUse) and `validate-result.sh` (PostToolUse for result file format validation) - Skills compose by loading other skills: `Read ${CLAUDE_PLUGIN_ROOT}/skills/{name}/SKILL.md` - Complex skills use `references/` subdirectories for supporting materials @@ -65,9 +65,15 @@ pnpm lint # Lint all packages - **Artifact chain**: `/create-spec` → spec markdown → `/create-tasks` → task JSON → `/execute-tasks` → code + session logs - **Wave-based execution**: Tasks grouped by topological sort level; N agents per wave, configurable via `--max-parallel` -- **Result file protocol**: Each task-executor writes a compact `result-task-{id}.md` (~18 lines) as its last action; orchestrator polls for these instead of consuming full agent output (79% context reduction per wave) -- **Per-task context isolation**: Each agent writes to `context-task-{id}.md`; orchestrator merges into shared `execution_context.md` between waves — eliminates write contention -- **Merge mode**: `/create-tasks` uses `task_uid` composite keys for idempotent re-runs — completed tasks preserved, pending tasks updated, new tasks created +- **File conflict detection**: Pre-wave scan extracts file paths from task descriptions and detects overlapping edits; conflicting tasks are moved to separate waves to prevent concurrent modification +- **Producer-consumer injection**: `/create-tasks` detects `produces_for` relationships between tasks; orchestrator injects completed producer task results into dependent task prompts via `CONTEXT FROM COMPLETED DEPENDENCIES` header +- **Result file protocol**: Each task-executor writes a compact `result-task-{id}.md` (~18 lines) as its last action; a PostToolUse hook (`validate-result.sh`) validates format on write and renames malformed files to `.invalid` +- **Event-driven completion detection**: Orchestrator uses `watch-for-results.sh` (fswatch-based, <1s latency) with automatic fallback to `poll-for-results.sh` (adaptive intervals: 5s→30s) when fswatch is unavailable +- **Per-task context isolation**: Each agent writes to `context-task-{id}.md` using a structured 6-section schema; orchestrator merges into shared `execution_context.md` between waves — eliminates write contention +- **Post-wave merge validation**: After merging context files, orchestrator validates `execution_context.md` structure (OK/WARN/ERROR); auto-repairs missing headers; forces compaction at >1000 lines +- **Retry escalation**: 3-tier strategy — Standard retry (attempt 2) → Context Enrichment with extra guidance (attempt 3) → User Escalation via AskUserQuestion (after all retries exhausted) +- **Progress streaming**: Orchestrator emits session start banner, wave start announcements, and wave completion summaries with per-task status, duration, and token usage between waves +- **Merge mode**: `/create-tasks` uses `task_uid` composite keys for idempotent re-runs — completed tasks preserved, pending tasks updated, new tasks created; Phase 6 detects producer-consumer relationships for `produces_for` field - **Session management**: Single-session invariant via `.lock` file; interrupted sessions auto-recovered with in_progress tasks reset to pending ### Session Directory Layout @@ -75,15 +81,27 @@ pnpm lint # Lint all packages ``` .claude/sessions/__live_session__/ # Active execution session ├── execution_plan.md # Wave plan from orchestrator -├── execution_context.md # Shared learnings across tasks +├── execution_context.md # Shared learnings (structured 6-section schema) ├── task_log.md # Per-task status, duration, tokens ├── progress.md # Real-time progress tracking ├── tasks/ # Archived completed task JSONs -├── context-task-{id}.md # Per-task context (ephemeral) -├── result-task-{id}.md # Per-task result (ephemeral) +├── context-task-{id}.md # Per-task context (structured, ephemeral) +├── result-task-{id}.md # Per-task result (validated by hook, ephemeral) +├── result-task-{id}.md.invalid # Renamed by validate-result hook if malformed └── .lock # Concurrency guard ``` +**Structured Context Schema** (`execution_context.md` and `context-task-{id}.md`): + +Both files follow a 6-section schema. The orchestrator initializes `execution_context.md` with these sections and merges per-task context files after each wave: + +1. **Project Setup** — Tech stack, build commands, environment details +2. **File Patterns** — File naming, directory structure, import conventions +3. **Conventions** — Coding style, error handling, logging patterns +4. **Key Decisions** — Architecture choices made during execution +5. **Known Issues** — Problems encountered, workarounds applied +6. **Task History** — Per-task outcomes with files modified and learnings (compacted at 10+ entries) + ### Cross-Plugin Dependencies `deep-analysis` (core-tools) is the keystone skill, loaded by 3 skills across 2 plugin groups: @@ -160,7 +178,7 @@ docs-manager -> docs-writer -> technical-diagrams (auto-loaded via skills: front |-------|--------|--------|---------| | core-tools | deep-analysis, codebase-analysis, language-patterns, project-conventions, technical-diagrams | code-explorer, code-synthesizer, code-architect | 0.2.1 | | dev-tools | feature-dev, bug-killer, architecture-patterns, code-quality, project-learnings, changelog-format, docs-manager, release-python-package, document-changes | code-reviewer, bug-investigator, changelog-manager, docs-writer | 0.3.1 | -| sdd-tools | create-spec, analyze-spec, create-tasks, execute-tasks | codebase-explorer, researcher, spec-analyzer, task-executor | 0.2.1 | +| sdd-tools | create-spec, analyze-spec, create-tasks, execute-tasks | codebase-explorer, researcher, spec-analyzer, task-executor | 0.3.1 | | tdd-tools | generate-tests, tdd-cycle, analyze-coverage, create-tdd-tasks, execute-tdd-tasks | test-writer, tdd-executor, test-reviewer | 0.2.0 | | git-tools | git-commit | — | 0.1.0 | | plugin-tools | port-plugin, validate-adapter, update-ported-plugin, dependency-checker, bump-plugin-version | researcher, port-converter | 0.1.1 | @@ -174,8 +192,10 @@ docs-manager -> docs-writer -> technical-diagrams (auto-loaded via skills: front | `claude/plugin-tools/skills/validate-adapter/SKILL.md` | 625 | Adapter validation against live platform docs (4 phases) | | `claude/plugin-tools/skills/update-ported-plugin/SKILL.md` | 793 | Incremental ported plugin updates with dual-track change detection (5 phases) | | `claude/sdd-tools/skills/create-spec/SKILL.md` | ~722 | Adaptive interview with context input, complexity detection, and depth-aware questioning | -| `claude/sdd-tools/skills/create-tasks/SKILL.md` | 653 | Spec-to-task decomposition with `task_uid` merge mode | -| `claude/sdd-tools/skills/execute-tasks/SKILL.md` | 262 | Wave-based parallel execution with session management | +| `claude/sdd-tools/skills/create-tasks/SKILL.md` | ~738 | Spec-to-task decomposition with `task_uid` merge mode and `produces_for` detection (9 phases) | +| `claude/sdd-tools/skills/execute-tasks/SKILL.md` | 273 | Wave-based parallel execution with session management | +| `claude/sdd-tools/skills/execute-tasks/references/orchestration.md` | ~1223 | 10-step orchestration loop with conflict detection, retry escalation, progress streaming, and merge validation | +| `claude/sdd-tools/agents/task-executor.md` | 414 | Task executor agent with embedded verification rules | | `claude/dev-tools/skills/feature-dev/SKILL.md` | 273 | 7-phase lifecycle spawning architect + reviewer agent teams | | `claude/dev-tools/skills/bug-killer/SKILL.md` | ~480 | Hypothesis-driven debugging — triage-based quick/deep track with agent investigation | | `claude/tdd-tools/skills/tdd-cycle/SKILL.md` | 727 | 7-phase RED-GREEN-REFACTOR TDD workflow | diff --git a/claude/.claude-plugin/marketplace.json b/claude/.claude-plugin/marketplace.json index e6350e8..a58eb33 100644 --- a/claude/.claude-plugin/marketplace.json +++ b/claude/.claude-plugin/marketplace.json @@ -37,7 +37,7 @@ }, { "name": "agent-alchemy-sdd-tools", - "version": "0.2.1", + "version": "0.3.1", "description": "Agent Alchemy SDD Tools — Spec Driven Development tools for AI agents", "source": "./sdd-tools", "homepage": "https://github.com/sequenzia/agent-alchemy/tree/main/claude/sdd-tools", diff --git a/claude/sdd-tools/agents/task-executor.md b/claude/sdd-tools/agents/task-executor.md index 14dc565..60d62c8 100644 --- a/claude/sdd-tools/agents/task-executor.md +++ b/claude/sdd-tools/agents/task-executor.md @@ -30,42 +30,45 @@ You have been launched by the `agent-alchemy-sdd:execute-tasks` skill with: - **Execution Context Path**: Path to `.claude/sessions/__live_session__/execution_context.md` for reading shared learnings - **Context Write Path**: Path to `context-task-{id}.md` for writing learnings (never write directly to `execution_context.md`) - **Result Write Path**: Path to `result-task-{id}.md` for writing the compact result file (completion signal for the orchestrator) +- **Upstream Task Output**: (if applicable) Result data from producer tasks injected as `## UPSTREAM TASK OUTPUT` blocks ## Process Overview Execute these 4 phases in order: -1. **Understand** - Load knowledge, read context, classify task, explore codebase +1. **Understand** - Read context, classify task, explore codebase 2. **Implement** - Read target files, make changes, write tests 3. **Verify** - Check acceptance criteria, run tests, determine status -4. **Complete** - Update task status, append learnings, write result file, return minimal status +4. **Complete** - Update task status, write context and result files, return status --- ## Phase 1: Understand -### Step 1: Load Knowledge +### Step 1: Read Execution Context -Read the execute-tasks skill and reference files: +Read `.claude/sessions/__live_session__/execution_context.md` if it exists. Review: +- Project Setup (package manager, runtime, frameworks, build tools) +- File Patterns (test patterns, component patterns, API route patterns) +- Conventions (import style, error handling, naming) +- Key Decisions (architecture choices from earlier tasks) +- Known Issues (problems to avoid, workarounds) +- Task History (what earlier tasks accomplished) -``` -Read: skills/execute-tasks/SKILL.md -Read: skills/execute-tasks/references/execution-workflow.md -Read: skills/execute-tasks/references/verification-patterns.md -``` +**Large context handling**: +- **200+ lines**: Read top sections in full (Project Setup through Known Issues). Keep last 5 Task History entries; summarize older entries briefly. +- **500+ lines**: Read top sections in full. Read only last 5 Task History entries; skip older entries entirely. -### Step 2: Read Execution Context +**Retry context**: If this is a retry, check Task History for the previous attempt's learnings. Run linter and tests to assess codebase state before adding changes. Decide whether to build on partial work or revert and try differently. -Read `.claude/sessions/__live_session__/execution_context.md` if it exists. Review: -- Project patterns and conventions from earlier tasks -- Key decisions already made -- Known issues and workarounds -- File map of important files -- Task history with outcomes +### Step 2: Read Upstream Task Output -If this is a retry attempt, pay special attention to the Task History entry for this task's previous attempt. +If `## UPSTREAM TASK OUTPUT` blocks are present in your prompt, these contain result data from producer tasks (via `produces_for`). Read them for: +- Files created or modified by upstream tasks +- Key decisions or conventions established upstream +- Context that informs your implementation approach -**Large context handling**: If `execution_context.md` is large (200+ lines), prioritize reading: Project Patterns, Key Decisions, Known Issues, File Map, and the last 5 Task History entries. Skim or skip older Task History entries to conserve context window. +Multiple upstream blocks appear in task ID order. If an upstream block shows `## UPSTREAM TASK #{id} FAILED`, note the failure and work around missing dependencies. ### Step 3: Load Task Details @@ -76,24 +79,29 @@ Use `TaskGet` with the provided task ID to get full details: ### Step 4: Classify Task -Determine the task type using this algorithm: +Determine the task type: -1. Check for `**Acceptance Criteria:**` in description → Spec-generated -2. Check for `metadata.spec_path` → Spec-generated -3. Check for `Source:` reference → Spec-generated -4. None found → General task +1. Check for `**Acceptance Criteria:**` in description -> Spec-generated +2. Check for `metadata.spec_path` -> Spec-generated +3. Check for `Source:` reference -> Spec-generated +4. None found -> General task ### Step 5: Parse Requirements **Spec-generated tasks:** -- Extract each acceptance criterion by category (Functional, Edge Cases, Error Handling, Performance) +- Extract each criterion under `_Functional:_`, `_Edge Cases:_`, `_Error Handling:_`, `_Performance:_` +- Each `- [ ]` line under a category header is one criterion - Extract Testing Requirements section - Note the source spec section **General tasks:** - Parse subject for intent ("Fix X", "Add X", "Refactor X", etc.) -- Extract "should...", "when...", "must..." statements -- Infer completion criteria +- Extract implicit criteria from description: + - "should..." / "must..." -> functional requirements + - "when..." -> scenarios to test + - "can..." -> capabilities to confirm + - "handle..." -> error scenarios to check +- Infer completion criteria from subject + description ### Step 6: Explore Codebase @@ -115,7 +123,7 @@ Before writing code, have a clear plan: ## Phase 2: Implement -Do NOT update `progress.md` — the orchestrator manages progress tracking. +Do NOT update `progress.md` -- the orchestrator manages progress tracking. ### Pre-Implementation @@ -136,16 +144,17 @@ Follow dependency order: - Match existing coding style and naming conventions - Follow `CLAUDE.md` project-specific rules -- Make only changes the task requires +- Make only changes the task requires; do not refactor surrounding code - Use clear naming; comment only when "why" isn't obvious - Handle errors at appropriate boundaries +- Follow the project's type conventions (TypeScript strict mode, Python type hints, etc.) ### Mid-Implementation Checks After core implementation, before tests: -1. Run linter if available -2. Run existing tests to check for regressions -3. Fix any issues before writing new tests +1. Run linter if available (`npm run lint`, `ruff check`, etc.) +2. Run existing tests to check for regressions (`npm test`, `pytest`, etc.) +3. Fix any issues before proceeding to write new tests ### Test Writing @@ -153,51 +162,97 @@ If testing requirements are specified: 1. Follow existing test framework and patterns 2. Write tests covering acceptance criteria behaviors 3. Include edge case tests from criteria -4. Use descriptive test names +4. Ensure tests are independent and can run in any order +5. Use descriptive test names that explain expected behavior --- ## Phase 3: Verify -Do NOT update `progress.md` — the orchestrator manages progress tracking. +Do NOT update `progress.md` -- the orchestrator manages progress tracking. -### Spec-Generated Tasks +### Spec-Generated Task Verification -Walk through each acceptance criteria category: +Walk through each acceptance criteria category systematically: -**Functional** (ALL must pass): -- Locate the code satisfying each criterion -- Run relevant tests -- Record PASS/FAIL per criterion +**Functional** (ALL must pass -- any failure means FAIL): +1. Locate the code satisfying each criterion +2. Verify correctness by reading the code +3. Run relevant tests that exercise the behavior +4. Record PASS/FAIL per criterion -**Edge Cases** (flagged but don't block): -- Check guard clauses and boundary handling -- Verify edge case tests -- Record results +**Edge Cases** (flagged but don't block -- failures mean PARTIAL): +1. Check guard clauses, boundary checks, null guards, validation +2. Find tests that exercise the edge case +3. Verify the edge case produces correct results +4. Record PASS/FAIL/SKIP per criterion -**Error Handling** (flagged but don't block): -- Check error paths and messages -- Verify error recovery -- Record results +**Error Handling** (flagged but don't block -- failures mean PARTIAL): +1. Check error paths (try/catch, error returns, validation errors) +2. Verify error messages are clear and informative +3. Confirm the system recovers gracefully +4. Record PASS/FAIL per criterion -**Performance** (flagged but don't block): -- Inspect approach efficiency -- Check for obvious issues (N+1 queries, unbounded loops) -- Record results +**Performance** (flagged but don't block -- failures mean PARTIAL): +1. Check that the implementation uses an efficient approach +2. Look for obvious issues: N+1 queries, unbounded loops, missing indexes +3. Run benchmarks if test infrastructure supports it +4. Record PASS/FAIL per criterion **Testing Requirements**: -- Run full test suite -- Verify all tests pass -- Check for regressions +- Parse the `**Testing Requirements:**` section from description +- For each test requirement, find or create the corresponding test +- Run full test suite; verify all tests pass; check for regressions + +#### Evidence by Category + +| Category | How to Verify | Evidence | +|----------|--------------|----------| +| Functional | Code inspection + test execution | File exists, function works, test passes | +| Edge Cases | Code inspection + targeted tests | Boundary handled, test covers scenario | +| Error Handling | Code inspection + error tests | Error caught, message returned, test confirms | +| Performance | Benchmark or code inspection | Efficient approach, no obvious bottlenecks | + +### General Task Verification + +Infer verification from the task subject and description: -### General Tasks +| Subject Pattern | Verification Approach | +|----------------|----------------------| +| "Fix {X}" | Bug no longer reproduces; regression tests pass | +| "Add {X}" / "Create {X}" | X exists and works; integrates with existing code | +| "Implement {X}" | X works end-to-end; tests cover core behavior | +| "Update {X}" | X reflects changes; nothing else broke | +| "Remove {X}" | X fully removed; no dead references | +| "Refactor {X}" | Behavior unchanged; tests still pass | + +Additional checks for all general tasks: +1. Run existing test suite -- no regressions +2. Run linter -- no new violations +3. Confirm no dead code left behind + +### Pass Threshold Rules + +**Spec-generated tasks:** + +| Category | Requirement | Failure Impact | +|----------|-------------|----------------| +| Functional | ALL must pass | Any failure -> FAIL | +| Edge Cases | Flagged, don't block | PARTIAL if Functional passes | +| Error Handling | Flagged, don't block | PARTIAL if Functional passes | +| Performance | Flagged, don't block | PARTIAL if Functional passes | +| Tests | ALL must pass | Any failure -> FAIL | + +**General tasks:** -1. Verify core change is implemented and works -2. Run existing test suite - no regressions -3. Run linter - no new violations -4. Confirm no dead code left behind +| Check | Requirement | Failure Impact | +|-------|-------------|----------------| +| Core change | Must be implemented | Missing -> FAIL | +| Tests pass | Existing tests must pass | Test failure -> FAIL | +| Linter | No new violations | New violations -> PARTIAL | +| No regressions | Nothing else broken | Regression -> FAIL | -### Status Determination +**Status determination:** | Condition | Status | |-----------|--------| @@ -207,6 +262,18 @@ Walk through each acceptance criteria category: | Any test failure | **FAIL** | | Core change missing (general task) | **FAIL** | +### Verification Reporting + +When recording criterion results, use these symbols: + +| Symbol | Meaning | +|--------|---------| +| `pass` | Criterion satisfied | +| `fail` | Criterion not satisfied (include reason) | +| `skip` | Criterion not applicable to implementation | + +In the result file's `## Verification` section, summarize counts and list any failures with reasons. + --- ## Phase 4: Complete @@ -221,39 +288,54 @@ TaskUpdate: taskId={id}, status=completed **If PARTIAL or FAIL:** Leave task as `in_progress`. Do NOT mark as completed. -### Append to Execution Context +### Write Context File -Write learnings to your per-task context file at the `Context Write Path` specified in your prompt (e.g., `.claude/sessions/__live_session__/context-task-{id}.md`). Do NOT write to `execution_context.md` directly — the orchestrator merges per-task files after each wave. +Write structured learnings to your per-task context file at the `Context Write Path`. Use the 6-section schema below. Only include sections where you have content to contribute -- omit empty sections. ```markdown -### Task [{id}]: {subject} - {PASS/PARTIAL/FAIL} -- Files modified: {list of files created or changed} -- Key learnings: {patterns discovered, conventions noted, useful file locations} -- Issues encountered: {problems hit, workarounds applied, things that didn't work} +## Project Setup +- {discovery about package manager, runtime, frameworks, build tools} + +## File Patterns +- {discovered test file patterns, component patterns, API route patterns} + +## Conventions +- {discovered import style, error handling, state management, naming} + +## Key Decisions +- [Task #{id}] {decision made and rationale} + +## Known Issues +- {issues encountered, workarounds applied} ``` -Include updates to Project Patterns, Key Decisions, Known Issues, and File Map sections as relevant — the orchestrator will merge these into the shared context after the wave completes. +Do NOT write to `execution_context.md` directly -- the orchestrator merges per-task files after each wave. + +**Note**: Task History is managed by the orchestrator from result files. Do not include a Task History section in the context file. + +**Error resilience**: If the context file write fails, do not crash. Log a `WARNING: Failed to write learnings to context file` in the result file Issues section and include learnings in the fallback report. ### Write Result File -As your **VERY LAST action** (after writing the context file), write a compact result file to the `Result Write Path` specified in your prompt (e.g., `.claude/sessions/__live_session__/result-task-{id}.md`): +As your **VERY LAST action** (after writing the context file), write a compact result file to the `Result Write Path`: ```markdown -# Task Result: [{id}] {subject} status: PASS|PARTIAL|FAIL -attempt: {n}/{max} +task_id: {id} +duration: {Xm Ys} -## Verification -- Functional: {n}/{total} -- Edge Cases: {n}/{total} -- Error Handling: {n}/{total} -- Tests: {passed}/{total} ({failed} failures) +## Summary +{1-3 sentence summary of what was done} ## Files Modified -- {path}: {brief description} +- {file path 1} -- {what changed} +- {file path 2} -- {what changed} -## Issues -{None or brief descriptions} +## Context Contribution +{Key learnings for downstream tasks: conventions discovered, patterns established, decisions made} + +## Verification +{What was checked and the result: criteria counts, test results, issues found} ``` **Ordering**: Context file FIRST, result file LAST. The result file's existence signals completion to the orchestrator. @@ -266,7 +348,7 @@ After writing the result file, return ONLY a single minimal status line: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL} ``` -**Fallback**: If the result file write fails, return the full structured report instead so the orchestrator can parse it from `TaskOutput`: +**Fallback**: If the result file write fails, return the full structured report so the orchestrator can parse it from `TaskOutput`: ``` TASK RESULT: {PASS|PARTIAL|FAIL} @@ -284,11 +366,19 @@ ISSUES: FILES MODIFIED: - {file path}: {brief description} -{If context append also failed:} +CONTEXT CONTRIBUTION: + - {key learnings for downstream tasks} + +{If context file write also failed:} LEARNINGS: - - Files modified: {list} - - Key learnings: {patterns, conventions, file locations} - - Issues encountered: {problems, workarounds} + ## Project Setup + - {discoveries} + ## Conventions + - {discoveries} + ## Key Decisions + - [Task #{id}] {decision} + ## Known Issues + - {issues} ``` --- @@ -318,7 +408,7 @@ Use this information to: - **No sub-agents**: Do not use the Task tool; you handle everything directly - **Read before write**: Always read files before modifying them - **Honest reporting**: Report PARTIAL or FAIL accurately; never mark complete if verification fails -- **Share learnings**: Always append to execution context, even on failure +- **Share learnings**: Always write context file, even on failure - **Minimal changes**: Only modify what the task requires -- **Session directory is auto-approved**: Freely create and modify any files within `.claude/sessions/` (including `__live_session__/` and archival folders) — these writes are auto-approved by the `auto-approve-session.sh` PreToolUse hook (execution_context.md, task logs, archived tasks, etc.). Do not ask for permission for these writes. +- **Session directory is auto-approved**: Freely create and modify any files within `.claude/sessions/` (including `__live_session__/` and archival folders) -- these writes are auto-approved by the `auto-approve-session.sh` PreToolUse hook (execution_context.md, task logs, archived tasks, etc.). Do not ask for permission for these writes. - **Per-task context and result files are auto-approved**: `context-task-{id}.md` and `result-task-{id}.md` files within `.claude/sessions/` are auto-approved by the `auto-approve-session.sh` PreToolUse hook, same as `execution_context.md`. diff --git a/claude/sdd-tools/hooks/hooks.json b/claude/sdd-tools/hooks/hooks.json index 1052a57..d5b81ea 100644 --- a/claude/sdd-tools/hooks/hooks.json +++ b/claude/sdd-tools/hooks/hooks.json @@ -1,5 +1,5 @@ { - "description": "Auto-approve file operations for execute-tasks session management", + "description": "Session management hooks: auto-approve file operations and validate result files", "hooks": { "PreToolUse": [ { @@ -12,6 +12,18 @@ } ] } + ], + "PostToolUse": [ + { + "matcher": "Write", + "hooks": [ + { + "type": "command", + "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/validate-result.sh", + "timeout": 5 + } + ] + } ] } } diff --git a/claude/sdd-tools/hooks/tests/validate-result.bats b/claude/sdd-tools/hooks/tests/validate-result.bats new file mode 100644 index 0000000..9c44891 --- /dev/null +++ b/claude/sdd-tools/hooks/tests/validate-result.bats @@ -0,0 +1,399 @@ +#!/usr/bin/env bats +# Tests for validate-result.sh PostToolUse hook + +SCRIPT_DIR="$(cd "$(dirname "${BATS_TEST_FILENAME}")/.." && pwd)" +HOOK_SCRIPT="$SCRIPT_DIR/validate-result.sh" +FIXTURES_DIR="$(cd "$(dirname "${BATS_TEST_FILENAME}")/../../tests/fixtures" && pwd)" + +setup() { + TEST_DIR="$(mktemp -d)" + SESSION_DIR="$TEST_DIR/project/.claude/sessions/__live_session__" + mkdir -p "$SESSION_DIR" + unset AGENT_ALCHEMY_HOOK_DEBUG 2>/dev/null || true +} + +teardown() { + rm -rf "$TEST_DIR" +} + +# Helper: create a valid result file +create_valid_result() { + local task_id="${1:-42}" + local file="$SESSION_DIR/result-task-${task_id}.md" + cat > "$file" <<'RESULT' +status: PASS +task_id: 42 +duration: 1m 30s + +## Summary +Implemented the feature successfully. + +## Files Modified +- src/foo.ts -- added new function + +## Context Contribution +Discovered that the project uses ESM imports. + +## Verification +Functional: 3/3, Edge Cases: 2/2, Tests: 5/5 (0 failures) +RESULT + echo "$file" +} + +# Helper: build hook JSON input +build_input() { + local tool_name="$1" + local file_path="$2" + jq -n --arg tool "$tool_name" --arg path "$file_path" \ + '{"tool_name": $tool, "tool_input": {"file_path": $path}}' +} + +# --- Valid result files accepted --- + +@test "valid PASS result file is accepted unchanged" { + local file + file=$(create_valid_result 42) + # Create context file (write-ordering invariant) + echo "### Task [42]: learnings" > "$SESSION_DIR/context-task-42.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + # File should still exist (not renamed) + [ -f "$file" ] + [ ! -f "${file}.invalid" ] +} + +@test "valid FAIL result file is accepted unchanged" { + local file="$SESSION_DIR/result-task-10.md" + cat > "$file" <<'RESULT' +status: FAIL +task_id: 10 +duration: 0m 45s + +## Summary +Failed to implement due to missing dependency. + +## Files Modified +- none + +## Context Contribution +None. + +## Verification +Functional: 1/3 +RESULT + echo "### Task [10]: learnings" > "$SESSION_DIR/context-task-10.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ -f "$file" ] + [ ! -f "${file}.invalid" ] +} + +@test "valid PARTIAL result file is accepted unchanged" { + local file="$SESSION_DIR/result-task-7.md" + cat > "$file" <<'RESULT' +status: PARTIAL +task_id: 7 +duration: 2m 10s + +## Summary +Partial implementation. + +## Files Modified +- src/bar.ts -- partial changes + +## Context Contribution +None. + +## Verification +Functional: 3/3, Edge: 1/2 +RESULT + echo "### Task [7]: learnings" > "$SESSION_DIR/context-task-7.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ -f "$file" ] + [ ! -f "${file}.invalid" ] +} + +# --- Missing status line --- + +@test "missing status line causes .invalid rename" { + local file="$SESSION_DIR/result-task-5.md" + cat > "$file" <<'RESULT' +# Task Result: [5] Some task + +## Summary +Did stuff. + +## Files Modified +- none + +## Context Contribution +None. +RESULT + echo "### Task [5]: learnings" > "$SESSION_DIR/context-task-5.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ ! -f "$file" ] + [ -f "${file}.invalid" ] +} + +# --- Invalid status value --- + +@test "invalid status value causes .invalid rename" { + local file="$SESSION_DIR/result-task-8.md" + cat > "$file" <<'RESULT' +status: SUCCESS +task_id: 8 + +## Summary +Done. + +## Files Modified +- none + +## Context Contribution +None. +RESULT + echo "### Task [8]: learnings" > "$SESSION_DIR/context-task-8.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ ! -f "$file" ] + [ -f "${file}.invalid" ] + # Error description should be appended + grep -q "VALIDATION ERRORS" "${file}.invalid" + grep -q "Invalid or missing status line" "${file}.invalid" +} + +@test "invalid status value UNKNOWN causes .invalid rename" { + local file="$SESSION_DIR/result-task-20.md" + cp "$FIXTURES_DIR/invalid-result-unknown-status.md" "$file" + echo "### Task [20]: learnings" > "$SESSION_DIR/context-task-20.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ ! -f "$file" ] + [ -f "${file}.invalid" ] + grep -q "VALIDATION ERRORS" "${file}.invalid" + grep -q "Invalid or missing status line" "${file}.invalid" +} + +# --- Fixture-based valid result --- + +@test "fixture: valid PASS result file from fixtures directory is accepted" { + local file="$SESSION_DIR/result-task-100.md" + cp "$FIXTURES_DIR/valid-result-pass.md" "$file" + echo "### Task [100]: learnings" > "$SESSION_DIR/context-task-100.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ -f "$file" ] + [ ! -f "${file}.invalid" ] +} + +@test "fixture: invalid result without summary from fixtures directory is rejected" { + local file="$SESSION_DIR/result-task-101.md" + cp "$FIXTURES_DIR/invalid-result-no-summary.md" "$file" + echo "### Task [101]: learnings" > "$SESSION_DIR/context-task-101.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ ! -f "$file" ] + [ -f "${file}.invalid" ] + grep -q "Missing required section: ## Summary" "${file}.invalid" +} + +# --- Missing required section --- + +@test "missing Summary section causes .invalid rename" { + local file="$SESSION_DIR/result-task-11.md" + cat > "$file" <<'RESULT' +status: PASS +task_id: 11 + +## Files Modified +- none + +## Context Contribution +None. +RESULT + echo "### Task [11]: learnings" > "$SESSION_DIR/context-task-11.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ ! -f "$file" ] + [ -f "${file}.invalid" ] + grep -q "Missing required section: ## Summary" "${file}.invalid" +} + +@test "missing Files Modified section causes .invalid rename" { + local file="$SESSION_DIR/result-task-12.md" + cat > "$file" <<'RESULT' +status: PASS +task_id: 12 + +## Summary +Done. + +## Context Contribution +None. +RESULT + echo "### Task [12]: learnings" > "$SESSION_DIR/context-task-12.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ ! -f "$file" ] + [ -f "${file}.invalid" ] + grep -q "Missing required section: ## Files Modified" "${file}.invalid" +} + +@test "missing Context Contribution section causes .invalid rename" { + local file="$SESSION_DIR/result-task-13.md" + cat > "$file" <<'RESULT' +status: PASS +task_id: 13 + +## Summary +Done. + +## Files Modified +- none +RESULT + echo "### Task [13]: learnings" > "$SESSION_DIR/context-task-13.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ ! -f "$file" ] + [ -f "${file}.invalid" ] + grep -q "Missing required section: ## Context Contribution" "${file}.invalid" +} + +# --- Missing context file --- + +@test "missing context file triggers stub creation, result still accepted" { + local file + file=$(create_valid_result 99) + # Deliberately do NOT create context-task-99.md + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + # Result file should still be accepted + [ -f "$file" ] + [ ! -f "${file}.invalid" ] + # Context stub should be created + [ -f "$SESSION_DIR/context-task-99.md" ] + grep -q "No learnings captured" "$SESSION_DIR/context-task-99.md" +} + +# --- Non-session file writes ignored --- + +@test "non-session file write is ignored" { + local file="$TEST_DIR/some-other/result-task-1.md" + mkdir -p "$(dirname "$file")" + echo "status: INVALID" > "$file" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + # File should be untouched (not renamed, hook skipped) + [ -f "$file" ] + [ ! -f "${file}.invalid" ] +} + +@test "non-result file in session directory is ignored" { + local file="$SESSION_DIR/execution_context.md" + echo "some content" > "$file" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ -f "$file" ] +} + +# --- Result files >25 lines accepted --- + +@test "result file with >25 lines is accepted with extra content" { + local file="$SESSION_DIR/result-task-50.md" + { + echo "status: PASS" + echo "task_id: 50" + echo "duration: 3m 0s" + echo "" + echo "## Summary" + echo "Lots of work done." + echo "" + echo "## Files Modified" + echo "- file1.ts -- change1" + echo "- file2.ts -- change2" + echo "" + echo "## Context Contribution" + echo "Many learnings." + echo "" + echo "## Verification" + echo "All passed." + # Add extra lines to exceed 25 + for i in $(seq 1 20); do + echo "Extra detail line $i" + done + } > "$file" + echo "### Task [50]: learnings" > "$SESSION_DIR/context-task-50.md" + + run bash -c "echo '$(build_input Write "$file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] + [ -f "$file" ] + [ ! -f "${file}.invalid" ] +} + +# --- Hook error handling --- + +@test "hook error: trap catches malformed JSON input, exits 0" { + run bash -c "echo 'not json at all' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] +} + +@test "hook error: empty input exits 0" { + run bash -c "echo '' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] +} + +@test "hook error: non-Write tool exits 0" { + run bash -c "echo '$(build_input Read "/some/file")' | bash '$HOOK_SCRIPT'" + + [ "$status" -eq 0 ] +} + +# --- Debug logging --- + +@test "debug logging only when AGENT_ALCHEMY_HOOK_DEBUG=1" { + local file + file=$(create_valid_result 42) + echo "### Task [42]: learnings" > "$SESSION_DIR/context-task-42.md" + + # Without debug: no stderr output + local stderr_output + stderr_output=$(echo "$(build_input Write "$file")" | AGENT_ALCHEMY_HOOK_DEBUG=0 bash "$HOOK_SCRIPT" 2>&1 1>/dev/null) + [ -z "$stderr_output" ] + + # With debug: stderr should have output + stderr_output=$(echo "$(build_input Write "$file")" | AGENT_ALCHEMY_HOOK_DEBUG=1 bash "$HOOK_SCRIPT" 2>&1 1>/dev/null) + [[ "$stderr_output" == *"[validate-result]"* ]] +} diff --git a/claude/sdd-tools/hooks/validate-result.sh b/claude/sdd-tools/hooks/validate-result.sh new file mode 100755 index 0000000..ba92b82 --- /dev/null +++ b/claude/sdd-tools/hooks/validate-result.sh @@ -0,0 +1,100 @@ +#!/bin/bash +# Validate result files written by task-executor agents (PostToolUse hook). +# +# Triggers on Write operations targeting result-task-*.md in session directories. +# Validates: status line, required sections, context file write-ordering invariant. +# +# If invalid: renames to result-task-{id}.md.invalid with error appended. +# If context file missing: creates a stub, result still accepted. +# +# IMPORTANT: This hook must NEVER exit non-zero. A non-zero exit in PostToolUse +# hooks causes unexpected behavior. Trap on ERR falls through cleanly. + +trap 'exit 0' ERR + +# Optional debug logging: set AGENT_ALCHEMY_HOOK_DEBUG=1 to enable +debug() { + if [ "${AGENT_ALCHEMY_HOOK_DEBUG:-}" = "1" ]; then + echo "[validate-result] $*" >&2 + fi +} + +input=$(cat 2>/dev/null) || input="" + +debug "Input received: ${input:0:200}" + +# Extract tool name and file path from hook input +tool_name=$(echo "$input" | jq -r '.tool_name // empty' 2>/dev/null) || tool_name="" + +# Only act on Write operations +[ "$tool_name" = "Write" ] || { debug "Not a Write operation ($tool_name), skipping"; exit 0; } + +file_path=$(echo "$input" | jq -r '.tool_input.file_path // empty' 2>/dev/null) || file_path="" +[ -n "$file_path" ] || { debug "No file_path found"; exit 0; } + +debug "File path: $file_path" + +# Only act on result-task-*.md files in session directories +case "$file_path" in + */.claude/sessions/*/result-task-*.md) ;; + *) debug "Not a session result file, skipping"; exit 0 ;; +esac + +# Extract task ID from filename: result-task-{id}.md +basename_file=$(basename "$file_path") +task_id="${basename_file#result-task-}" +task_id="${task_id%.md}" + +debug "Validating result file for task $task_id" + +# Check the file exists and is readable +[ -f "$file_path" ] || { debug "File does not exist yet: $file_path"; exit 0; } + +content=$(cat "$file_path" 2>/dev/null) || { debug "Cannot read file"; exit 0; } + +errors="" + +# Validate first line: must match status: (PASS|PARTIAL|FAIL) +first_line=$(echo "$content" | head -n1) +if ! echo "$first_line" | grep -qE '^status: (PASS|PARTIAL|FAIL)$'; then + errors="${errors}Invalid or missing status line (expected 'status: PASS|PARTIAL|FAIL', got '${first_line}')\n" +fi + +# Validate required sections +for section in "## Summary" "## Files Modified" "## Context Contribution"; do + if ! echo "$content" | grep -qF "$section"; then + errors="${errors}Missing required section: $section\n" + fi +done + +# Warn on large result files (>25 lines) but don't reject +line_count=$(echo "$content" | wc -l | tr -d ' ') +if [ "$line_count" -gt 25 ]; then + debug "WARNING: Result file has $line_count lines (expected ~18), extra content present" +fi + +# If validation errors found, rename to .invalid +if [ -n "$errors" ]; then + debug "Validation failed: $(echo -e "$errors")" + { + cat "$file_path" + echo "" + echo "--- VALIDATION ERRORS ---" + echo -e "$errors" + } > "${file_path}.invalid" 2>/dev/null + rm -f "$file_path" 2>/dev/null + debug "Renamed to ${file_path}.invalid" + exit 0 +fi + +# Check write-ordering invariant: context-task-{id}.md should exist +session_dir=$(dirname "$file_path") +context_file="${session_dir}/context-task-${task_id}.md" + +if [ ! -f "$context_file" ]; then + debug "Context file missing, creating stub: $context_file" + echo "### Task [${task_id}]: No learnings captured" > "$context_file" 2>/dev/null +fi + +debug "Validation passed for task $task_id" +exit 0 diff --git a/claude/sdd-tools/skills/create-tasks/SKILL.md b/claude/sdd-tools/skills/create-tasks/SKILL.md index b7fad98..d23bbd3 100644 --- a/claude/sdd-tools/skills/create-tasks/SKILL.md +++ b/claude/sdd-tools/skills/create-tasks/SKILL.md @@ -1,7 +1,7 @@ --- name: create-tasks description: Generate Claude Code native Tasks from an existing spec. Use when user says "create tasks", "generate tasks from spec", "spec to tasks", "task generation", or wants to decompose a spec into implementation tasks. -argument-hint: "[spec-path]" +argument-hint: "[spec-path] [--phase ]" user-invocable: true disable-model-invocation: false allowed-tools: AskUserQuestion, Read, Glob, Grep, TaskCreate, TaskUpdate, TaskList, TaskGet @@ -9,6 +9,9 @@ arguments: - name: spec-path description: Path to the spec file to analyze for task generation required: true + - name: phase + description: Comma-separated phase numbers to generate tasks for (e.g., "1,2"). Omit to select interactively or generate all. + required: false --- # Spec to Tasks - Create Tasks Skill @@ -43,21 +46,31 @@ The tasks are planning artifacts themselves — generating them IS the planning ## Workflow Overview -This workflow has eight phases: +This workflow has ten phases: -1. **Validate & Load** — Validate spec file, read content, check settings, load reference files -2. **Detect Depth & Check Existing** — Detect spec depth level, check for existing tasks -3. **Analyze Spec** — Extract features, requirements, and structure from spec -4. **Decompose Tasks** — Break features into atomic tasks with acceptance criteria -5. **Infer Dependencies** — Map blocking relationships between tasks -6. **Preview & Confirm** — Show summary, get user approval before creating -7. **Create Tasks** — Create tasks via TaskCreate/TaskUpdate (fresh or merge mode) -8. **Error Handling** — Handle spec parsing issues, circular deps, missing info +1. **Validate & Load** — Validate spec file, parse `--phase` argument, read content, check settings, load reference files +2. **Detect Depth & Check Existing** — Detect spec depth level, check for existing tasks with phase metadata +3. **Analyze Spec** — Extract features, requirements, structure, and implementation phases from spec +4. **Select Phases** — Interactive or CLI-driven phase selection for incremental generation +5. **Decompose Tasks** — Phase-filtered hybrid decomposition from features and deliverables +6. **Infer Dependencies** — Phase-aware blocking relationships with cross-phase handling +7. **Detect Producer-Consumer Relationships** — Identify `produces_for` relationships between tasks +8. **Preview & Confirm** — Show phase-annotated summary, get user approval before creating +9. **Create Tasks** — Create tasks via TaskCreate/TaskUpdate with `spec_phase` metadata (fresh or merge mode) +10. **Error Handling** — Handle spec parsing issues, circular deps, missing info, phase-related errors --- ## Phase 1: Validate & Load +### Parse Arguments + +Before validating the spec file, parse the provided arguments: + +1. **Extract spec path**: The first positional argument is the spec file path +2. **Check for `--phase` flag**: If `--phase` is present, parse the comma-separated integers that follow (e.g., `--phase 1,2` → `[1, 2]`) +3. Store as `selected_phases_cli` (empty list if `--phase` not provided) + ### Validate Spec File Verify the spec file exists at the provided path. @@ -141,7 +154,8 @@ Look for tasks with `metadata.spec_path` matching the spec path. If existing tasks found: - Count them by status (pending, in_progress, completed) - Note their task_uids for merge mode -- Inform user about merge behavior +- Extract `spec_phase` metadata from existing tasks to build `existing_phases_map`: `{phase_number → {pending, in_progress, completed, total, phase_name}}` +- Inform user about merge behavior with phase-aware detail Report to user: ``` @@ -150,6 +164,11 @@ Found {n} existing tasks for this spec: • {in_progress} in progress • {completed} completed +{If existing tasks have spec_phase metadata:} +Previously generated phases: +• Phase {N}: {phase_name} — {total} tasks ({completed} completed, {pending} pending) +• Phase {M}: {phase_name} — {total} tasks ({completed} completed, {pending} pending) + New tasks will be merged. Completed tasks will be preserved. ``` @@ -180,7 +199,7 @@ Extract information from each spec section: | **7.3 Data Models** (Full-Tech) | Entity definitions → data model tasks | | **7.4 API Specifications** (Full-Tech) | Endpoints → API tasks | | **8.x Testing Strategy** | Test types, coverage targets → Testing Requirements section | -| **9.x Implementation Plan** | Phases → task grouping | +| **9.x Implementation Plan** | Phases, deliverables, completion criteria, checkpoint gates → phase metadata and task decomposition input | | **10.x Dependencies** | Explicit dependencies → blockedBy relationships | ### Feature Extraction @@ -224,9 +243,118 @@ Adjust task granularity based on depth level: - Technical decomposition - Example: "Create User model", "Implement POST /auth/login", "Add auth middleware" +### Phase Extraction + +Extract implementation phases from Section 9 if present: + +1. **Detect Section 9**: Look for `## 9. Implementation Plan` or `## Implementation Phases` +2. **Extract phase headers**: Pattern `### 9.N Phase N: {Name}` (detailed/full-tech) or `### Phase N: {Name}` (high-level) +3. **For each phase, extract**: + - `number` — Phase number (integer from `9.N` or `Phase N`) + - `name` — Phase name (text after `Phase N: `) + - `completion_criteria` — Text after `**Completion Criteria**:` + - `deliverables` — Parsed table rows from the deliverable table (columns: Deliverable, Description, Dependencies; optionally Technical Tasks) + - `checkpoint_gate` — Items after `**Checkpoint Gate**:` (prose or checkbox list `- [ ]`) +4. **Cross-reference deliverables to Section 5 features**: Scan deliverable descriptions and technical tasks for feature name references. Build mapping: `{phase_number → [feature_names]}` +5. If no Section 9 found, set `spec_phases = []` + +Store the extracted phases as `spec_phases` for use in Phase 4 (Select Phases) and Phase 5 (Decompose Tasks). + --- -## Phase 4: Decompose Tasks +## Phase 4: Select Phases + +Select which implementation phases to generate tasks for. Three paths based on context: + +### Path A — `--phase` argument provided + +Skip interactive selection. Validate that each phase number in `selected_phases_cli` exists in `spec_phases`. If any phase number is invalid, report the valid range and stop. + +### Path B — No `--phase`, spec has phases (2-3 phases) + +Use a single AskUserQuestion with multiSelect: + +```yaml +questions: + - header: "Phases" + question: "Which implementation phases should I generate tasks for?" + options: + - label: "All phases (Recommended)" + description: "Generate tasks for all {N} phases at once" + - label: "Phase 1: {name}" + description: "{deliverable_count} deliverables — {completion_criteria_brief}" + - label: "Phase 2: {name}" + description: "{deliverable_count} deliverables — {completion_criteria_brief}" + - label: "Phase 3: {name}" + description: "{deliverable_count} deliverables — {completion_criteria_brief}" + multiSelect: true +``` + +If user selects "All phases", generate for all. Otherwise generate only for the selected phase(s). + +### Path C — No `--phase`, spec has 4+ phases + +Two-step selection: + +1. First ask "All phases or select specific?": + ```yaml + questions: + - header: "Phases" + question: "This spec has {N} implementation phases. Generate tasks for all or select specific phases?" + options: + - label: "All phases (Recommended)" + description: "Generate tasks for all {N} phases" + - label: "Select specific phases" + description: "Choose which phases to generate tasks for" + multiSelect: false + ``` + +2. If "Select specific phases", show multiSelect with individual phases (up to 4 per AskUserQuestion, paginate if needed). + +### Path D — No Section 9 / no phases + +Skip selection entirely. Log: "No implementation phases found in spec. Generating tasks from features only." + +Set `selected_phases = []` (all features will be processed without phase assignment). + +### Path E — Merge mode with existing phases + +When existing tasks with `spec_phase` metadata were found in Phase 2, show a specialized prompt: + +```yaml +questions: + - header: "Phases" + question: "Previously generated phases detected. Which phases should I generate tasks for?" + options: + - label: "Remaining phases only (Recommended)" + description: "Generate tasks for phases not yet created: {list of remaining phase names}" + - label: "All phases (merge)" + description: "Re-generate all phases, merging with existing tasks" + - label: "Select specific phases" + description: "Choose which phases to generate tasks for" + multiSelect: false +``` + +If "Select specific phases", follow Path B/C selection flow. + +--- + +## Phase 5: Decompose Tasks + +### Phase-Aware Feature Mapping + +When `spec_phases` is non-empty and phases were selected in Phase 4: + +1. **Map features to phases** using the cross-reference from Phase Extraction: + - Features explicitly referenced in phase deliverables → map to that phase + - Features not referenced in any phase deliverable → assign to the earliest plausible phase (based on dependency layer: data models → Phase 1, UI → last phase) +2. **Filter to selected phases**: Only decompose features mapping to selected phases +3. **Deliverables as additional input**: For each selected phase, check if deliverables have technical tasks not covered by Section 5 feature decomposition. Create additional tasks from uncovered deliverables with `source_section: "9.{N}"` +4. **Assign phase metadata**: Every task gets `spec_phase` (integer) and `spec_phase_name` (string) + +When `spec_phases = []` (no Section 9 in spec): Current behavior unchanged — decompose all features without phase assignment. The `spec_phase` and `spec_phase_name` fields are omitted entirely (backward compatible). + +### Standard Layer Pattern For each feature, apply the standard layer pattern: @@ -281,6 +409,7 @@ description: | Source: {spec_path} Section {number} activeForm: "Creating User data model" # Present continuous +produces_for: ["{consumer_task_id}", ...] # Optional — IDs of tasks that consume this task's output metadata: priority: critical|high|medium|low # Mapped from P0-P3 complexity: XS|S|M|L|XL # Estimated size @@ -289,8 +418,18 @@ metadata: feature_name: "User Authentication" # Parent feature task_uid: "{spec_path}:{feature}:{type}:{seq}" # Unique ID task_group: "{spec-name}" # REQUIRED — Group from spec title + spec_phase: 1 # Phase number from Section 9 (omit if no phases) + spec_phase_name: "Foundation" # Phase name from Section 9 (omit if no phases) ``` +**`produces_for` Field:** + +The `produces_for` field is an **optional** array of task IDs identifying tasks that directly consume this task's output. The `execute-tasks` orchestrator uses this field to inject the producer's result file content into the dependent task's prompt, giving downstream agents richer context than wave-granular `execution_context.md` merging alone provides. + +- **Omit** if the task has no direct producer-consumer relationship with other tasks +- **Include** when the task's deliverable (model, schema, config, etc.) is directly referenced in another task's description +- Values are task IDs (set during Phase 9 after all tasks are created and IDs are known) + ### Acceptance Criteria Categories Group acceptance criteria into these categories: @@ -358,7 +497,7 @@ Examples: --- -## Phase 5: Infer Dependencies +## Phase 6: Infer Dependencies Apply automatic dependency rules: @@ -372,11 +511,17 @@ Data Model → API → UI → Tests - UI tasks depend on their APIs - Tests depend on their implementations +Within-phase layer dependencies work unchanged regardless of phase selection. + ### Phase Dependencies -If spec has implementation phases: -- Phase 2 tasks blocked by Phase 1 completion -- Phase 3 tasks blocked by Phase 2 completion +When tasks have `spec_phase` metadata, apply cross-phase blocking based on three scenarios: + +1. **Phase N-1 tasks exist in current generation**: Normal `blockedBy` — tasks in Phase N are blocked by Phase N-1 tasks +2. **Phase N-1 tasks exist from prior generation (merge mode)**: Create `blockedBy` relationships to existing Phase N-1 task IDs (found via `existing_phases_map` from Phase 2) +3. **Phase N-1 was NOT selected and no existing tasks found**: Do NOT add `blockedBy` to non-existent tasks. Instead: + - Add a "Prerequisites" note to task descriptions listing assumed-complete deliverables from the missing phase + - Emit a one-time warning: "Phase {N} tasks generated without Phase {N-1} predecessor tasks. Phase {N-1} deliverables are assumed complete." ### Explicit Spec Dependencies @@ -393,7 +538,69 @@ If features share: --- -## Phase 6: Preview & Confirm +## Phase 7: Detect Producer-Consumer Relationships + +After inferring `blockedBy` dependencies, identify which tasks produce output that is directly consumed by other tasks. These relationships are emitted as the `produces_for` field on producer tasks, enabling the `execute-tasks` orchestrator to inject richer upstream context into dependent task prompts. + +### Detection Approach + +Analyze the decomposed tasks and their `blockedBy` relationships to find producer-consumer pairs. A producer-consumer relationship exists when: + +1. Task B is blocked by Task A (`blockedBy`), AND +2. Task A's deliverable is **directly referenced** in Task B's description — Task B cannot be implemented without the specific artifact Task A produces + +**Conservative principle**: When uncertain whether a relationship is truly producer-consumer, omit `produces_for`. False positives add unnecessary context to dependent tasks; false negatives are harmless (the task still gets wave-granular context via `execution_context.md`). + +### Producer-Consumer Patterns + +Detect these common patterns: + +| Producer Task Type | Consumer Task Type | Signal | +|---|---|---| +| **Data Model** | API/Service that uses the model | Consumer description references entity name, fields, or schema defined by producer | +| **Schema/Type Definition** | Implementation that implements the schema | Consumer implements interfaces, types, or contracts defined by producer | +| **Configuration/Infrastructure** | Tasks that consume the config | Consumer reads config values, connects to services, or uses infrastructure set up by producer | +| **Foundation/Framework** | Tasks that build on the foundation | Consumer extends base classes, uses utilities, or follows patterns established by producer | +| **API Endpoint** | UI/Frontend that calls the endpoint | Consumer calls specific endpoints or uses response formats defined by producer | +| **Migration/Setup** | Tasks that require the setup | Consumer reads from tables, uses resources, or depends on state created by producer | + +### Detection Algorithm + +For each pair of tasks where Task B has Task A in its `blockedBy` list: + +1. **Check deliverable reference**: Does Task B's description explicitly reference an artifact that Task A creates? + - Entity/model names: "using User model", "User schema", "User table" + - Endpoint paths: "calls POST /auth/login", "uses /api/users response" + - Config keys: "reads database config", "uses JWT secret" + - File/module names: "imports from auth-middleware", "extends BaseService" + +2. **Check layer relationship**: Is the dependency a direct layer-to-layer producer-consumer? + - Data Model → API endpoint for that model (YES — API needs the model definition) + - Data Model → Unrelated API endpoint (NO — just a layer ordering) + - Config → Service using that config (YES — service consumes the config) + - Auth setup → Feature behind auth (NO — auth is a gate, not a consumed output) + +3. **Assign produces_for**: If the relationship is a direct producer-consumer, add Task B's ID to Task A's `produces_for` array + +### Multi-Consumer Tasks + +A single producer may have multiple consumers. For example, a "Create User data model" task may produce for both "Implement registration endpoint" and "Implement login endpoint". In this case, `produces_for` contains all consumer IDs: + +``` +produces_for: ["{registration_task_id}", "{login_task_id}"] +``` + +### Circular Production Prevention + +`produces_for` follows the same acyclicity as `blockedBy`. Since `produces_for` is derived from `blockedBy` relationships (which are already validated for circular dependencies in Phase 5), circular production relationships cannot occur. If a `produces_for` relationship is detected outside of a `blockedBy` pair, skip it — the dependency inference already prevents circular `blockedBy`. + +### Output + +After detection, annotate each producer task in the internal task list with its `produces_for` array. Tasks with no producer-consumer relationships have no `produces_for` field (the field is omitted, not set to an empty array). + +--- + +## Phase 8: Preview & Confirm Before creating tasks, present a summary: @@ -403,27 +610,41 @@ TASK GENERATION PREVIEW ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Spec: {spec_name} Depth: {depth_level} +{If phases selected:} +Phases: {selected_count} of {total_count} SUMMARY: • Total tasks: {count} • By priority: {critical} critical, {high} high, {medium} medium, {low} low • By complexity: {XS} XS, {S} S, {M} M, {L} L, {XL} XL +{If phases selected:} +PHASES: +• Phase {N}: {phase_name} — {n} tasks +• Phase {M}: {phase_name} — {n} tasks + +{If partial phases and predecessor phases not generated:} +PREREQUISITES: +• Phase {N-1}: {phase_name} — assumed complete (not in this generation) + FEATURES: -• {Feature 1} → {n} tasks -• {Feature 2} → {n} tasks +• {Feature 1} (Phase {N}) → {n} tasks +• {Feature 2} (Phase {M}) → {n} tasks ... DEPENDENCIES: • {n} dependency relationships inferred +• {m} producer-consumer relationships detected • Longest chain: {n} tasks FIRST TASKS (no blockers): -• {Task 1 subject} ({priority}) -• {Task 2 subject} ({priority}) +• {Task 1 subject} ({priority}, Phase {N}) +• {Task 2 subject} ({priority}, Phase {M}) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ``` +When no phases are present, the `Phases:`, `PHASES:`, `PREREQUISITES:` sections and phase annotations on feature/task lines are omitted. + Then use AskUserQuestion to confirm: ```yaml @@ -448,7 +669,7 @@ If user selects "Show task details": --- -## Phase 7: Create Tasks +## Phase 9: Create Tasks ### Fresh Mode (No Existing Tasks) @@ -498,13 +719,17 @@ TaskCreate: feature_name: "User Authentication" task_uid: "specs/SPEC-Auth.md:user-auth:model:001" task_group: "user-authentication" + spec_phase: 1 + spec_phase_name: "Foundation" ``` **Important**: Track the mapping between task_uid and returned task ID for dependency setup. -#### Step 2: Set Dependencies +**Phase metadata**: Include `spec_phase` and `spec_phase_name` on every task when the spec has implementation phases. Omit both fields entirely when no phases exist (backward compatible with phase-unaware tasks). -After all tasks are created, use TaskUpdate to set dependencies: +#### Step 2: Set Dependencies and produces_for + +After all tasks are created, use TaskUpdate to set `blockedBy` dependencies and `produces_for` relationships using the task_uid-to-ID mapping: ``` TaskUpdate: @@ -512,6 +737,16 @@ TaskUpdate: addBlockedBy: ["{model_task_id}"] ``` +For tasks identified as producers in Phase 7, set `produces_for` via TaskUpdate: + +``` +TaskUpdate: + taskId: "{model_task_id}" + produces_for: ["{api_task_id}", "{service_task_id}"] +``` + +**Note**: Only set `produces_for` on tasks that were identified as producers in Phase 7. Tasks without producer-consumer relationships should not have `produces_for` set. + #### Step 3: Report Completion ``` @@ -520,6 +755,7 @@ TASK CREATION COMPLETE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Created {n} tasks from {spec_name} Set {m} dependency relationships +Set {p} producer-consumer relationships (produces_for) Use TaskList to view all tasks. @@ -592,7 +828,7 @@ Total tasks: {total} --- -## Error Handling +## Phase 10: Error Handling ### Spec Parsing Issues @@ -615,6 +851,17 @@ If required information missing from spec: 2. Add `incomplete: true` to metadata 3. Note what's missing in description +### Phase-Related Errors + +**`--phase` provided but spec has no Section 9:** +Inform user: "The `--phase` argument was provided but this spec has no Implementation Plan (Section 9). Generating tasks from all features without phase filtering." Proceed without phase selection. + +**`--phase` references non-existent phase numbers:** +Report valid phase numbers and stop: "Invalid phase number(s): {invalid}. This spec has phases: {list of valid phase numbers with names}." + +**Section 9 format doesn't match expected patterns:** +Degrade gracefully — if phase headers can't be parsed, log a warning: "Section 9 found but phase structure could not be parsed. Generating tasks from features only." Set `spec_phases = []` and continue. + --- ## Example Usage @@ -629,6 +876,16 @@ If required information missing from spec: /agent-alchemy-sdd:create-tasks SPEC-Payments.md ``` +### Generate tasks for a specific phase +``` +/agent-alchemy-sdd:create-tasks specs/SPEC-Auth.md --phase 1 +``` + +### Generate tasks for multiple phases +``` +/agent-alchemy-sdd:create-tasks specs/SPEC-Auth.md --phase 1,2 +``` + ### Re-running (Merge Mode) ``` /agent-alchemy-sdd:create-tasks specs/SPEC-User-Authentication.md diff --git a/claude/sdd-tools/skills/create-tasks/references/dependency-inference.md b/claude/sdd-tools/skills/create-tasks/references/dependency-inference.md index dade56a..27beea5 100644 --- a/claude/sdd-tools/skills/create-tasks/references/dependency-inference.md +++ b/claude/sdd-tools/skills/create-tasks/references/dependency-inference.md @@ -104,13 +104,22 @@ Add {Integration} tests ### Section 9 (Implementation Plan) Mapping -When spec has implementation phases: +When spec has implementation phases and tasks include `spec_phase` metadata, apply cross-phase dependencies based on three scenarios: +**Scenario 1 — Phase N-1 tasks exist in current generation:** ``` Phase 1 tasks ← Phase 2 tasks ← Phase 3 tasks ``` +All tasks in Phase N are blocked by completion of Phase N-1. Standard `blockedBy` relationships. -All tasks in Phase N are blocked by completion of Phase N-1. +**Scenario 2 — Phase N-1 tasks exist from prior generation (merge mode):** +When `create-tasks` runs with `--phase 2` and Phase 1 tasks already exist from a prior run, create `blockedBy` relationships to the existing Phase 1 task IDs (identified via `metadata.spec_phase` on existing tasks). + +**Scenario 3 — Phase N-1 not generated and no existing tasks:** +When `create-tasks` runs with `--phase 2` but no Phase 1 tasks exist: +- Do NOT add `blockedBy` to non-existent tasks +- Add a "Prerequisites" note to Phase 2 task descriptions listing assumed-complete deliverables from Phase 1 +- Emit a one-time warning about missing predecessor phases ### Section 10 (Dependencies) Mapping diff --git a/claude/sdd-tools/skills/execute-tasks/SKILL.md b/claude/sdd-tools/skills/execute-tasks/SKILL.md index 0eec423..a420ce3 100644 --- a/claude/sdd-tools/skills/execute-tasks/SKILL.md +++ b/claude/sdd-tools/skills/execute-tasks/SKILL.md @@ -1,7 +1,7 @@ --- name: execute-tasks description: Execute pending Claude Code Tasks in dependency order with wave-based concurrent execution and adaptive verification. Supports task group filtering and configurable parallelism. Use when user says "execute tasks", "run tasks", "start execution", "work on tasks", or wants to execute generated tasks autonomously. -argument-hint: "[task-id] [--task-group ] [--retries ] [--max-parallel ]" +argument-hint: "[task-id] [--task-group ] [--phase ] [--retries ] [--max-parallel ]" user-invocable: true disable-model-invocation: false allowed-tools: @@ -24,6 +24,9 @@ arguments: - name: task-group description: Optional task group name to filter tasks. Only tasks with matching metadata.task_group will be executed. required: false + - name: phase + description: Comma-separated phase numbers to filter tasks by spec_phase metadata (e.g., "1,2"). Only tasks with matching metadata.spec_phase will be executed. + required: false - name: retries description: Number of retry attempts for failed/partial tasks before moving on. Default is 3. required: false @@ -75,7 +78,7 @@ Produce accurate verification results: This skill orchestrates task execution through a 10-step loop. See `references/orchestration.md` for the full detailed procedure. ### Step 1: Load Task List -Retrieve all tasks via `TaskList`. If a `--task-group` argument was provided, filter tasks to only those with matching `metadata.task_group`. If a specific `task-id` argument was provided, validate it exists. +Retrieve all tasks via `TaskList`. If a `--task-group` argument was provided, filter tasks to only those with matching `metadata.task_group`. If a `--phase` argument was provided, further filter the task list to only tasks where `metadata.spec_phase` matches one of the specified phase numbers. If no tasks match the phase filter, inform the user: "No tasks found for phase(s) {N}. Available phases in current task set: {sorted list of distinct `metadata.spec_phase` values}." and stop. When both `--task-group` and `--phase` are provided, both filters apply (intersection). Tasks without `spec_phase` metadata are excluded when `--phase` is active. If a specific `task-id` argument was provided, validate it exists. ### Step 2: Validate State Handle edge cases: empty list, all completed, specific task blocked, no unblocked tasks, circular dependencies. @@ -87,7 +90,7 @@ Resolve `max_parallel` setting using precedence: `--max-parallel` CLI arg > `.cl Read `.claude/agent-alchemy.local.md` if it exists for execution preferences, including `max_parallel` setting. CLI `--max-parallel` argument takes precedence over the settings file value. ### Step 5: Initialize Execution Directory -Generate a `task_execution_id` using three-tier resolution: (1) if `--task-group` provided → `{task_group}-{YYYYMMDD}-{HHMMSS}`, (2) else if all open tasks share the same `metadata.task_group` → `{task_group}-{YYYYMMDD}-{HHMMSS}`, (3) else → `exec-session-{YYYYMMDD}-{HHMMSS}`. Clean any stale `__live_session__/` files by archiving them to `.claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/`, resetting any `in_progress` tasks from the interrupted session back to `pending`. Check for and enforce the concurrency guard via `.lock` file. Create `.claude/sessions/__live_session__/` directory containing: +Generate a `task_execution_id` using multi-tier resolution: (1) if `--task-group` AND `--phase` provided → `{task_group}-phase{N}-{YYYYMMDD}-{HHMMSS}`, (2) if `--task-group` provided (no phase) → `{task_group}-{YYYYMMDD}-{HHMMSS}`, (3) if `--phase` provided (no group) AND all filtered tasks share same group → `{task_group}-phase{N}-{YYYYMMDD}-{HHMMSS}`, else `phase{N}-{YYYYMMDD}-{HHMMSS}`, (4) else if all open tasks share the same `metadata.task_group` → `{task_group}-{YYYYMMDD}-{HHMMSS}`, (5) else → `exec-session-{YYYYMMDD}-{HHMMSS}`. Where `{N}` is the phase number (or `{N}-{M}` for multiple phases, e.g., `phase1-2`). Clean any stale `__live_session__/` files by archiving them to `.claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/`, resetting any `in_progress` tasks from the interrupted session back to `pending`. Check for and enforce the concurrency guard via `.lock` file. Create `.claude/sessions/__live_session__/` directory containing: - `execution_plan.md` — saved execution plan from Step 5 - `execution_context.md` — initialized with standard template - `task_log.md` — initialized with table headers (Task ID, Subject, Status, Attempts, Duration, Token Usage) @@ -104,7 +107,7 @@ Then ask the user to confirm before proceeding with execution. If the user cance Read `.claude/sessions/__live_session__/execution_context.md` (created in Step 5). If a prior execution context exists, look in `.claude/sessions/` for the most recent timestamped subfolder and merge relevant learnings into the new one. ### Step 8: Execute Loop -Execute tasks in waves. For each wave: snapshot `execution_context.md`, mark wave tasks `in_progress`, update `progress.md` with all active tasks, launch up to `max_parallel` background agents simultaneously via **parallel Task tool calls with `run_in_background: true`**, recording the `{task_list_id → background_task_id}` mapping from each Task tool response. Each agent writes to `context-task-{id}.md` and a compact `result-task-{id}.md` (completion signal). The orchestrator polls for result files via the `poll-for-results.sh` script in multi-round Bash invocations (each with `timeout: 480000`), logging progress between rounds, then batch-reads results to process outcomes — avoiding full agent output in context. After polling, the orchestrator calls `TaskOutput` on each background task_id to reap the process and extract per-task `duration_ms` and `total_tokens` usage metadata for the task log. If `TaskOutput` times out (agent stuck), `TaskStop` is called to force-terminate the agent. Failed tasks with retries remaining are re-launched as background agents (same TaskOutput reaping after retry polling). After all wave agents complete: merge per-task context files into `execution_context.md`, clean up result and context files, archive completed task JSONs, refresh TaskList for newly unblocked tasks, form next wave, repeat. +Before entering the wave loop, emit a session start summary: `Execution plan: {total_tasks} tasks across {total_waves} waves (max {max_parallel} parallel)`. Execute tasks in waves. Before each wave, run a **pre-wave file conflict scan**: parse all wave tasks' descriptions and acceptance criteria for file path references (paths with `/`, known extensions like `.md`/`.ts`/`.js`/`.json`/`.sh`/`.py`, and glob patterns); if two or more tasks reference the same file, defer the higher-ID task(s) to the next wave (lowest ID stays) and log conflicts to `execution_plan.md`. See `references/orchestration.md` Step 7a.5 for the full procedure. Then emit: `Starting Wave {N}/{total}: {count} tasks...`. Before launching agents, build **upstream injection blocks** for tasks with `produces_for` dependencies: read producer result files and inject as `CONTEXT FROM COMPLETED DEPENDENCIES` in the agent prompt (see `references/orchestration.md` for the injection procedure). For each wave: snapshot `execution_context.md`, mark wave tasks `in_progress`, update `progress.md` with all active tasks, launch up to `max_parallel` background agents simultaneously via **parallel Task tool calls with `run_in_background: true`**, recording the `{task_list_id → background_task_id}` mapping from each Task tool response. Each agent writes to `context-task-{id}.md` and a compact `result-task-{id}.md` (completion signal). The orchestrator detects completion via `watch-for-results.sh` (event-driven, primary) with fallback to `poll-for-results.sh` (adaptive polling) if filesystem watch tools are unavailable, then batch-reads results to process outcomes — avoiding full agent output in context. After detection, the orchestrator calls `TaskOutput` on each background task_id to reap the process and extract per-task `duration_ms` and `total_tokens` usage metadata for the task log. If `TaskOutput` times out (agent stuck), `TaskStop` is called to force-terminate the agent. After processing results, emit a wave completion summary showing pass/fail count, wave duration, and per-task breakdown (ID, name, status, duration, token count). Failed tasks are retried using a **3-tier escalation strategy**: Tier 1 (Standard) re-launches with failure context, Tier 2 (Context Enrichment) adds full execution context + related results, Tier 3 (User Escalation) pauses for user input via AskUserQuestion with options to fix manually, skip, provide guidance, or abort. See `references/orchestration.md` Step 7e for details. After all wave agents complete: merge per-task context files into `execution_context.md`, clean up result and context files, archive completed task JSONs, refresh TaskList for newly unblocked tasks, form next wave, repeat. ### Step 9: Session Summary Display execution results with pass/fail counts, total execution time (sum of per-task `duration_ms`), failed task list, newly unblocked tasks, and total token usage (sum of per-task `total_tokens` captured via `TaskOutput`). Save `session_summary.md` to `.claude/sessions/__live_session__/`. Archive the session by moving all contents from `__live_session__/` to `.claude/sessions/{task_execution_id}/`, leaving `__live_session__/` as an empty directory. `execution_pointer.md` stays pointing to `__live_session__/`. @@ -136,6 +139,7 @@ Load context and understand scope before writing code. - Read the execute-tasks skill and references - Read `.claude/sessions/__live_session__/execution_context.md` for learnings from prior tasks +- Read upstream task output if `## UPSTREAM TASK OUTPUT` blocks are present in the prompt (injected via `produces_for` — see `references/orchestration.md`) - Load task details via `TaskGet` - Classify the task (spec-generated vs general) - Parse acceptance criteria or infer requirements from description @@ -196,20 +200,23 @@ This enables later tasks to benefit from earlier discoveries and retry attempts ## Key Behaviors -- **Autonomous execution loop**: After the user confirms the execution plan, no further prompts occur between tasks. The loop runs without interruption once started. +- **Autonomous execution loop**: After the user confirms the execution plan, no further prompts occur between tasks — except during Tier 3 retry escalation, when persistent failures are presented to the user via AskUserQuestion. The loop runs without interruption once started. +- **Progress streaming**: The orchestrator emits human-readable progress summaries at wave boundaries: a session start message with task/wave counts, "Starting Wave N" before each wave launch, and a wave completion summary with per-task breakdown (status, duration, token count) after each wave completes. Wave-level granularity only — no per-task streaming during a wave. - **Background agent execution**: Agents run as background tasks (`run_in_background: true`), returning ~3 lines (task_id + output_file) instead of ~100+ lines of full output. This reduces orchestrator context consumption by ~79% per wave. -- **Agent process reaping**: After polling confirms result files exist, the orchestrator calls `TaskOutput` on each background task_id to reap the process and extract per-task `duration_ms` and `total_tokens` usage metadata. If `TaskOutput` times out, `TaskStop` force-terminates the stuck agent. This prevents lingering background processes. -- **Result file protocol**: Each agent writes a compact `result-task-{id}.md` (~18 lines) as its very last action. The orchestrator polls for these files via `poll-for-results.sh` in multi-round Bash invocations (each with `timeout: 480000`), with progress output between rounds, then batch-reads them for processing. The result file doubles as a completion signal. +- **Agent process reaping**: After completion detection confirms result files exist, the orchestrator calls `TaskOutput` on each background task_id to reap the process and extract per-task `duration_ms` and `total_tokens` usage metadata. If `TaskOutput` times out, `TaskStop` force-terminates the stuck agent. This prevents lingering background processes. +- **Event-driven completion detection**: Each agent writes a compact `result-task-{id}.md` (~18 lines) as its very last action. The orchestrator detects these files via `watch-for-results.sh` (fswatch/inotifywait, <1s latency) as primary, falling back to `poll-for-results.sh` (adaptive 5s-30s intervals) if watch tools are unavailable (exit code 2). Both scripts emit `RESULT_FOUND:` lines for incremental progress and `ALL_DONE` on completion. The result file doubles as a completion signal. - **Batched session file updates**: Instead of per-task read-modify-write on `task_log.md` and `progress.md`, all updates are batched into a single read-modify-write cycle per file per wave. +- **Upstream prompt injection (produces_for)**: Tasks with `produces_for` fields declare which downstream tasks consume their output. When launching a dependent task, the orchestrator reads the producer's result file and injects it into the agent prompt as `CONTEXT FROM COMPLETED DEPENDENCIES`. Multiple producers are injected in task ID order. Failed producers inject a failure notice instead. Tasks without `produces_for` behave unchanged (wave-granular context only). See `references/orchestration.md` for the full injection procedure. +- **File conflict detection**: Before each wave launch, task descriptions and acceptance criteria are scanned for file path references. If two tasks reference the same file, the lower-ID task stays and higher-ID tasks are deferred to the next wave, preventing concurrent edit conflicts. Conflicts are logged in `execution_plan.md`. No overhead when no conflicts are found. - **Wave-based parallelism**: Tasks at the same dependency level run simultaneously, up to `max_parallel` concurrent agents per wave. Tasks in later waves wait until their dependencies in earlier waves complete. - **One agent per task, multiple per wave**: Each task gets a fresh agent invocation with isolated context, but multiple agents run concurrently within a wave. - **Per-task context isolation**: Each agent writes to `context-task-{id}.md` regardless of `max_parallel` setting. The orchestrator merges these after each wave. This eliminates write contention and fragile Edit operations on shared files. -- **Within-wave retry**: Failed tasks with retries remaining are re-launched as background agents. The orchestrator enters a new multi-round polling loop for retry result files (same `poll-for-results.sh` pattern as initial wave polling). +- **3-tier retry escalation**: Failed tasks progress through escalating retry strategies: Tier 1 (Standard) re-launches with failure context, Tier 2 (Context Enrichment) injects full `execution_context.md` + related task results, and Tier 3 (User Escalation) pauses to ask the user via AskUserQuestion with 4 options: "Fix manually and continue", "Skip this task", "Provide guidance", or "Abort session". If the user provides guidance, a guided retry is launched; if that also fails, the user is re-prompted. Each task has an independent escalation path; Tier 1/2 retries are batched per wave, Tier 3 is sequential. Escalation level is tracked in `task_log.md` per task. See `references/orchestration.md` Step 7e for the full procedure. +- **Phase-based filtering**: When `--phase N` is provided, only tasks with `metadata.spec_phase` matching the specified phase number(s) are included. This combines with `--task-group` filtering (both are ANDed). Tasks without `spec_phase` metadata are excluded when `--phase` is active. - **Configurable parallelism**: Default 5 concurrent tasks, configurable via `--max-parallel` argument or `.claude/agent-alchemy.local.md` settings. Set to 1 for sequential execution. -- **Configurable retries**: Default 3 attempts per task, configurable via `retries` argument. -- **Retry with context**: Each retry includes failure details from the previous attempt's result file so the agent can try a different approach. +- **Configurable retries**: Default 3 attempts per task, configurable via `retries` argument. Each retry tier maps to one attempt. - **Dynamic unblocking**: After each wave completes, the dependency graph is refreshed and newly unblocked tasks are added to the next wave. -- **Honest failure handling**: After retries exhausted, tasks stay `in_progress` (not completed), and execution continues. +- **Honest failure handling**: After all automated retries exhausted, the user must choose an action at Tier 3. Tasks left unresolved stay `in_progress` (not completed). - **Circular dependency detection**: If all remaining tasks are blocked by each other, break at the weakest link (task with fewest blockers) and log a warning. - **Shared context**: Agents read the snapshot of `execution_context.md` and write learnings to per-task context files. The orchestrator appends per-task content to the Task History section between waves. - **Resilient context sharing**: If a task-executor fails to write its context or result file, the orchestrator falls back to `TaskOutput` to capture diagnostic output. @@ -248,6 +255,21 @@ This enables later tasks to benefit from earlier discoveries and retry attempts /agent-alchemy-sdd:execute-tasks --task-group payments --retries 1 ``` +### Execute tasks for a specific phase +``` +/agent-alchemy-sdd:execute-tasks --phase 1 +``` + +### Execute specific phase within a task group +``` +/agent-alchemy-sdd:execute-tasks --task-group user-authentication --phase 2 +``` + +### Execute multiple phases +``` +/agent-alchemy-sdd:execute-tasks --phase 1,2 +``` + ### Execute with limited parallelism ``` /agent-alchemy-sdd:execute-tasks --max-parallel 2 diff --git a/claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md b/claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md index 0fcd6c9..019b188 100644 --- a/claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md +++ b/claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md @@ -1,33 +1,30 @@ # Execution Workflow Reference +> **Documentation-only**: This file documents the 4-phase task execution workflow for reference purposes. Task-executor agents no longer load this file at startup -- essential rules are embedded directly in `task-executor.md`. This file is retained for onboarding, debugging, and spec traceability. + This reference provides the detailed 4-phase workflow for executing a single Claude Code Task. Each phase has specific procedures depending on whether the task is spec-generated or a general task. ## Phase 1: Understand Load context and understand the task scope before writing any code. -### Step 1: Load Knowledge - -Read the execute-tasks skill and its references: -``` -Read: skills/execute-tasks/SKILL.md -Read: skills/execute-tasks/references/execution-workflow.md -Read: skills/execute-tasks/references/verification-patterns.md -``` - -### Step 2: Read Execution Context +### Step 1: Read Execution Context Check for shared execution context from prior tasks in this session: ``` Read: .claude/sessions/__live_session__/execution_context.md ``` -If the file exists, review: -- **Project Patterns** - Coding conventions, tech stack details discovered by earlier tasks -- **Key Decisions** - Architecture choices already made -- **Known Issues** - Problems to avoid, workarounds in place -- **File Map** - Important files and their purposes -- **Task History** - What earlier tasks accomplished and any issues encountered +The execution context uses a structured 6-section schema. If the file exists, review each section: + +| Section | What to Look For | +|---------|-----------------| +| `## Project Setup` | Package manager, runtime, frameworks, build tools | +| `## File Patterns` | Test file patterns, component patterns, API route patterns | +| `## Conventions` | Import style, error handling, state management, naming | +| `## Key Decisions` | Architecture choices made by earlier tasks | +| `## Known Issues` | Problems to avoid, workarounds in place | +| `## Task History` | What earlier tasks accomplished and any issues encountered | Use this context to inform your approach. If the file does not exist, proceed without it. @@ -35,22 +32,31 @@ Use this context to inform your approach. If the file does not exist, proceed wi If `execution_context.md` has grown large: -- **200+ lines (~8KB)**: Keep the last 5 Task History entries in full. Summarize older Task History entries into a brief paragraph. Keep Project Patterns, Key Decisions, Known Issues, and File Map sections in full. -- **500+ lines (~20KB)**: Read selectively — always read the top sections (Project Patterns, Key Decisions, Known Issues, File Map) and the last 5 Task History entries. Skip older Task History entries entirely. +- **200+ lines**: Read top sections in full (Project Setup through Known Issues). Keep the last 5 Task History entries in full. Summarize older Task History entries into a brief paragraph. +- **500+ lines**: Read top sections in full (Project Setup through Known Issues). Read only the last 5 Task History entries. Skip older Task History entries entirely. #### Retry Context Check If this is a retry attempt: -1. Read the previous attempt's learnings from `execution_context.md` +1. Check Task History in `execution_context.md` for the previous attempt's learnings 2. Assess the current codebase state: run linter and tests to understand what the previous attempt left behind 3. Decide approach: build on the previous attempt's partial work, or revert and try a different strategy +### Step 2: Read Upstream Task Output + +If `## UPSTREAM TASK OUTPUT` blocks are present in the agent prompt, these contain result data from producer tasks (via `produces_for`). Read them for: +- Files created or modified by upstream tasks +- Key decisions or conventions established upstream +- Context that informs the implementation approach + +Multiple upstream blocks appear in task ID order. If an upstream block shows `## UPSTREAM TASK #{id} FAILED`, note the failure and work around missing dependencies. + ### Step 3: Load Task Details Use `TaskGet` to retrieve the full task details including: - Subject and description -- Metadata (priority, complexity, source_section, spec_path, feature_name) +- Metadata (priority, complexity, source_section, spec_path, feature_name, task_group) - Dependencies (blockedBy, blocks) ### Step 4: Classify Task @@ -67,13 +73,18 @@ If any of the above are present, classify as **spec-generated**. Otherwise, clas **For spec-generated tasks:** - Extract each acceptance criterion by category (Functional, Edge Cases, Error Handling, Performance) +- Each `- [ ]` line under a category header is one criterion - Extract Testing Requirements section - Note the source spec section for context - Read the source spec section if referenced for additional context **For general tasks:** - Parse the subject line for intent: "Fix X" = bug fix, "Add X" = new feature, "Refactor X" = restructuring, "Update X" = modification -- Extract any "should..." or "when..." statements from description +- Extract implicit criteria from description: + - "should..." / "must..." -> functional requirements + - "when..." -> scenarios to test + - "can..." -> capabilities to confirm + - "handle..." -> error scenarios to check - Infer completion criteria from the description ### Step 6: Explore Codebase @@ -86,19 +97,19 @@ Understand the affected code before making changes: 4. Read the key files that will be modified 5. Identify test file locations and patterns -### Step 7: Summarize Scope +### Step 7: Plan Implementation -Before proceeding to implementation, have a clear understanding of: -- What files need to be created or modified +Before proceeding to implementation, have a clear plan: +- Which files to create or modify - What the expected behavior change is -- What tests need to be written or updated +- What tests to write or update - What project conventions to follow --- ## Phase 2: Implement -Do NOT update `progress.md` — the orchestrator manages progress tracking. +Do NOT update `progress.md` -- the orchestrator manages progress tracking. Execute the implementation following project patterns and best practices. @@ -153,7 +164,7 @@ If the task specifies testing requirements or the project has test patterns: ## Phase 3: Verify -Do NOT update `progress.md` — the orchestrator manages progress tracking. +Do NOT update `progress.md` -- the orchestrator manages progress tracking. Verify the implementation against task requirements. The verification approach is adaptive based on task classification. @@ -161,29 +172,34 @@ Verify the implementation against task requirements. The verification approach i Walk through each acceptance criteria category systematically: -**Functional Criteria:** -- For each criterion, verify the implementation satisfies it -- Run relevant tests to confirm behavior -- Check that the code path exists and is reachable - -**Edge Cases:** -- Verify boundary conditions are handled -- Check that edge case scenarios produce correct results -- Run edge case tests if written - -**Error Handling:** -- Verify error scenarios are handled gracefully -- Check that error messages are clear and actionable -- Confirm error recovery behavior works - -**Performance:** (if applicable) -- Run performance-related tests if specified -- Verify resource usage is within bounds +**Functional** (ALL must pass -- any failure means FAIL): +- For each criterion, locate the code that satisfies it +- Verify correctness by reading the code +- Run relevant tests that exercise the behavior +- Record PASS/FAIL per criterion + +**Edge Cases** (flagged but don't block -- failures mean PARTIAL): +- Check guard clauses, boundary checks, null guards, validation +- Find tests that exercise the edge case +- Verify the edge case produces correct results +- Record PASS/FAIL/SKIP per criterion + +**Error Handling** (flagged but don't block -- failures mean PARTIAL): +- Check error paths (try/catch, error returns, validation errors) +- Verify error messages are clear and informative +- Confirm the system recovers gracefully +- Record PASS/FAIL per criterion + +**Performance** (flagged but don't block -- failures mean PARTIAL): +- Check that the implementation uses an efficient approach +- Look for obvious issues: N+1 queries, unbounded loops, missing indexes +- Run benchmarks if test infrastructure supports it +- Record PASS/FAIL per criterion **Testing Requirements:** -- Run all tests: `npm test`, `pytest`, or project-specific command -- Verify test count matches expectations -- Check for test failures +- Parse the `**Testing Requirements:**` section from description +- For each test requirement, find or create the corresponding test +- Run full test suite; verify all tests pass; check for regressions ### General Task Verification @@ -197,7 +213,34 @@ For tasks without structured acceptance criteria: ### Pass Threshold Rules -See `verification-patterns.md` for detailed pass/fail criteria. +**Spec-generated tasks:** + +| Category | Requirement | Failure Impact | +|----------|-------------|----------------| +| Functional | ALL must pass | Any failure -> FAIL | +| Edge Cases | Flagged, don't block | PARTIAL if Functional passes | +| Error Handling | Flagged, don't block | PARTIAL if Functional passes | +| Performance | Flagged, don't block | PARTIAL if Functional passes | +| Tests | ALL must pass | Any failure -> FAIL | + +**General tasks:** + +| Check | Requirement | Failure Impact | +|-------|-------------|----------------| +| Core change | Must be implemented | Missing -> FAIL | +| Tests pass | Existing tests must pass | Test failure -> FAIL | +| Linter | No new violations | New violations -> PARTIAL | +| No regressions | Nothing else broken | Regression -> FAIL | + +**Status determination:** + +| Condition | Status | +|-----------|--------| +| All Functional pass + Tests pass | **PASS** | +| All Functional pass + Tests pass + Edge/Error/Perf issues | **PARTIAL** | +| Any Functional fail | **FAIL** | +| Any test failure | **FAIL** | +| Core change missing (general task) | **FAIL** | --- @@ -225,48 +268,66 @@ TaskUpdate: taskId={id}, status=completed **If PARTIAL or FAIL:** Leave task as `in_progress`. Do NOT mark as completed. The orchestrating skill will decide whether to retry. -### Append to Execution Context +### Write Context File + +Write structured learnings to your per-task context file at the `Context Write Path` specified in the agent prompt (e.g., `.claude/sessions/__live_session__/context-task-{id}.md`). Do NOT write to `execution_context.md` directly -- the orchestrator merges per-task files after each wave. -Write learnings to your per-task context file at the `Context Write Path` specified in your prompt (e.g., `.claude/sessions/__live_session__/context-task-{id}.md`). Do NOT write to `execution_context.md` directly — the orchestrator merges per-task files after each wave. +Use the 6-section structured context schema. Only include sections where there is content to contribute -- omit empty sections: ```markdown -### Task [{id}]: {subject} - {PASS/PARTIAL/FAIL} -- Files modified: {list of files created or changed} -- Key learnings: {patterns discovered, conventions noted, useful file locations} -- Issues encountered: {problems hit, workarounds applied, things that didn't work} +## Project Setup +- {discovery about package manager, runtime, frameworks, build tools} + +## File Patterns +- {discovered test file patterns, component patterns, API route patterns} + +## Conventions +- {discovered import style, error handling, state management, naming} + +## Key Decisions +- [Task #{id}] {decision made and rationale} + +## Known Issues +- {issues encountered, workarounds applied} ``` -Include updates to Project Patterns, Key Decisions, Known Issues, and File Map sections as relevant — the orchestrator will merge these into the shared context after the wave. +**Entry format conventions**: +- Each entry is a single bullet point (`- `) on one line +- Key Decisions entries start with `[Task #{id}]` to attribute the decision +- Entries should be factual and concise (one line per entry) + +**Note**: Task History is managed by the orchestrator from result files. Do not include a Task History section in the context file. #### Error Resilience If the write to the per-task context file fails: -1. **Do not crash** — continue the workflow normally +1. **Do not crash** -- continue the workflow normally 2. Log a `WARNING: Failed to write learnings to context file` line in the result file Issues section 3. Include the learnings in the result file Issues section as fallback 4. The orchestrator will pick up the fallback learnings from the result file ### Write Result File -As your **VERY LAST action** (after writing the context file), write a compact result file to the `Result Write Path` specified in your prompt (e.g., `.claude/sessions/__live_session__/result-task-{id}.md`): +As your **VERY LAST action** (after writing the context file), write a compact result file to the `Result Write Path` specified in the agent prompt (e.g., `.claude/sessions/__live_session__/result-task-{id}.md`): ```markdown -# Task Result: [{id}] {subject} status: PASS|PARTIAL|FAIL -attempt: {n}/{max} +task_id: {id} +duration: {Xm Ys} -## Verification -- Functional: {n}/{total} -- Edge Cases: {n}/{total} -- Error Handling: {n}/{total} -- Tests: {passed}/{total} ({failed} failures) +## Summary +{1-3 sentence summary of what was done} ## Files Modified -- {path}: {brief description} +- {file path 1} -- {what changed} +- {file path 2} -- {what changed} + +## Context Contribution +{Key learnings for downstream tasks: conventions discovered, patterns established, decisions made} -## Issues -{None or brief descriptions} +## Verification +{What was checked and the result: criteria counts, test results, issues found} ``` **Ordering**: Context file FIRST, result file LAST. The result file's existence signals completion to the orchestrator. @@ -295,23 +356,25 @@ VERIFICATION: Functional: {n}/{total} passed Edge Cases: {n}/{total} passed Error Handling: {n}/{total} passed - Performance: {n}/{total} passed (or N/A) Tests: {passed}/{total} ({failed} failures) -{If PARTIAL or FAIL:} ISSUES: - - {criterion that failed}: {what went wrong} - - {criterion that failed}: {what went wrong} - -RECOMMENDATIONS: - - {suggestion for fixing or completing} + - {criterion}: {what went wrong} FILES MODIFIED: - - {file path}: {brief description of change} + - {file path}: {brief description} + +CONTEXT CONTRIBUTION: + - {key learnings for downstream tasks} -{If context append also failed:} +{If context file write also failed:} LEARNINGS: - - Files modified: {list} - - Key learnings: {patterns, conventions, file locations} - - Issues encountered: {problems, workarounds} + ## Project Setup + - {discoveries} + ## Conventions + - {discoveries} + ## Key Decisions + - [Task #{id}] {decision} + ## Known Issues + - {issues} ``` diff --git a/claude/sdd-tools/skills/execute-tasks/references/orchestration.md b/claude/sdd-tools/skills/execute-tasks/references/orchestration.md index d41fc9c..fcbff54 100644 --- a/claude/sdd-tools/skills/execute-tasks/references/orchestration.md +++ b/claude/sdd-tools/skills/execute-tasks/references/orchestration.md @@ -52,12 +52,143 @@ The result file's existence serves as the completion signal. If it exists, the c If an agent crashes before writing its result file, the orchestrator falls back to `TaskOutput` (blocking read of the background task's output) to diagnose the failure. This is the last-resort path and produces the same context pressure as the old foreground approach, but only for crashed agents. +## Structured Context Schema + +### Purpose + +Provide a consistent, section-based format for `execution_context.md` and per-task `context-task-{id}.md` files. Section headers serve as merge anchors during the post-wave merge step, enabling reliable appending, deduplication, and compaction. + +### Section Headers (6 Fixed Sections) + +All execution context files use these 6 section headers in this order: + +| Section | Purpose | Example Entries | +|---------|---------|-----------------| +| `## Project Setup` | Package manager, runtime, frameworks, build tools | `- Runtime: Node.js 22 with pnpm` | +| `## File Patterns` | Test file patterns, component patterns, API route patterns | `- Tests: `__tests__/{name}.test.ts` alongside source` | +| `## Conventions` | Import style, error handling, state management, naming | `- Imports: Named exports preferred, barrel files for public API` | +| `## Key Decisions` | Choices made during execution with task references | `- [Task #5] Used Zod for runtime validation over io-ts` | +| `## Known Issues` | Problems encountered, workarounds, gotchas | `- Vitest mock.calls array resets between tests in same suite` | +| `## Task History` | Compact log: task ID, name, status, key contribution | `- [12] Create API handler — PASS: added /api/users endpoint` | + +### Per-Task Context File Format + +Each task-executor agent writes `context-task-{id}.md` using the same 6 section headers. Agents **omit sections that have no content** (only include sections with actual entries). This keeps per-task files compact. + +```markdown +## Project Setup +- Runtime: Python 3.12, uv for package management + +## Conventions +- Error handling: raise specific exception subclasses, catch at API boundary + +## Key Decisions +- [Task #7] Chose SQLAlchemy 2.0 async API over raw asyncpg + +## Task History +- [7] Implement database layer — PASS: created models, migrations, session factory +``` + +In this example, `## File Patterns` and `## Known Issues` are omitted because the agent had nothing to report for those sections. + +**Entry format conventions**: +- Each entry is a single bullet point (`- `) on one line +- Key Decisions entries start with `[Task #{id}]` to attribute the decision +- Task History entries follow the format: `- [{id}] {subject} — {PASS|PARTIAL|FAIL}: {brief contribution}` +- Entries should be factual and concise (one line per entry) + +### Merge Procedure + +The orchestrator merges per-task context files into `execution_context.md` in Step 7f using section headers as merge anchors. See Step 7f for the detailed procedure. + +### Error Handling for Malformed Context Files + +If a per-task context file is missing all `## ` section headers (completely unstructured content): +1. Log a warning: `WARNING: context-task-{id}.md has no section headers — placing content under ## Key Decisions` +2. Treat the entire file content as entries under `## Key Decisions` +3. Proceed with the merge normally + +If a per-task context file has some recognized sections and some content outside any section header: +1. Content before the first `## ` header is placed under `## Key Decisions` +2. Content under recognized headers is merged normally + +## Upstream Prompt Injection (produces_for) + +### Purpose + +Enable producer tasks to pass richer context directly to specific dependent tasks beyond what wave-granular `execution_context.md` merging provides. A producer task declares which downstream tasks consume its output via the `produces_for` field, and the orchestrator injects the producer's result file content into the dependent task's prompt at launch time. + +### Task JSON Schema Extension + +Tasks may include an optional `produces_for` field — an array of task IDs that consume this task's output: + +```json +{ + "id": "5", + "subject": "Implement API handler", + "description": "...", + "produces_for": ["8", "12"], + "blockedBy": ["3"] +} +``` + +- `produces_for` is **optional**. Tasks without it behave as before (wave-granular context only via `execution_context.md`). +- Values are string task IDs referencing tasks that depend on this task's output. +- A task may appear in multiple producers' `produces_for` arrays (receiving output from multiple upstream tasks). +- `produces_for` is independent of `blockedBy` — a task can produce for a dependent without being in its `blockedBy` list, though they typically overlap. + +### Injection Procedure + +When launching a dependent task (Step 7c), the orchestrator checks if any **completed** tasks in prior waves have a `produces_for` array that includes the dependent task's ID. If so: + +1. **Collect producers**: Find all completed tasks whose `produces_for` contains the current task's ID. Sort by task ID (ascending numeric order). + +2. **Read producer result files**: For each producer task, read `.claude/sessions/__live_session__/result-task-{producer_id}.md` (or from the archived session tasks if already cleaned up). + +3. **Build injection blocks**: For each producer: + + **If result file exists (producer succeeded):** + ```markdown + ## UPSTREAM TASK OUTPUT (Task #{producer_id}: {producer_subject}) + {result file content} + --- + ``` + + **If result file is missing or producer failed:** + ```markdown + ## UPSTREAM TASK #{producer_id} FAILED + Task: {producer_subject} + Status: FAIL + {failure summary from task_log.md if available, otherwise "No failure details available."} + --- + ``` + +4. **Inject into prompt**: Insert all injection blocks into the dependent task's prompt **after** the task description/metadata section and **before** the `CONCURRENT EXECUTION MODE` section. This ensures the agent reads upstream context after understanding the task but before beginning execution. + +5. **Log injection**: For each injection, log: `Injecting upstream output from task #{producer_id} into task #{dependent_id}` + +### Result File Retention for produces_for + +Producer task result files that have `produces_for` entries pointing to **not-yet-completed** tasks must be **retained** during the 7f cleanup step (same retention rule as FAIL result files). The orchestrator deletes these retained result files only after all tasks listed in the producer's `produces_for` have completed. + +### No-op When Absent + +If no tasks in the task set have `produces_for` fields, the injection procedure is skipped entirely — no overhead. The orchestrator simply launches agents with the standard prompt template. + +--- + ## Step 1: Load Task List Use `TaskList` to get all tasks and their current state. If a `--task-group` argument was provided, filter the task list to only tasks where `metadata.task_group` matches the specified group. If no tasks match the group, inform the user and stop. +If a `--phase` argument was provided, further filter the task list to only tasks where `metadata.spec_phase` matches one of the specified phase numbers (parsed as comma-separated integers). This filter is applied after `--task-group` filtering (if both are present). + +If no tasks match the phase filter after all filters are applied, inform the user: "No tasks found for phase(s) {N}. Available phases in current task set: {sorted list of distinct `metadata.spec_phase` values}." and stop. + +Tasks without `metadata.spec_phase` (created before phase-aware `create-tasks`) are excluded when `--phase` filtering is active. + If a specific `task-id` argument was provided, validate it exists. If it doesn't exist, inform the user and stop. ## Step 2: Validate State @@ -168,10 +299,15 @@ If the user selects **"Cancel"**, report "Execution cancelled. No tasks were mod ## Step 5.5: Initialize Execution Directory -Generate a unique `task_execution_id` using three-tier resolution: -1. IF `--task-group` was provided → `{task_group}-{YYYYMMDD}-{HHMMSS}` (e.g., `user-auth-20260131-143022`) -2. ELSE IF all open tasks (pending + in_progress) share the same non-empty `metadata.task_group` → `{task_group}-{YYYYMMDD}-{HHMMSS}` -3. ELSE → `exec-session-{YYYYMMDD}-{HHMMSS}` (e.g., `exec-session-20260131-143022`) +Generate a unique `task_execution_id` using a multi-tier resolution that incorporates phase when specified: +1. IF `--task-group` was provided AND `--phase` was provided → `{task_group}-phase{N}-{YYYYMMDD}-{HHMMSS}` (e.g., `user-auth-phase1-20260131-143022`) +2. IF `--task-group` was provided (no phase) → `{task_group}-{YYYYMMDD}-{HHMMSS}` (e.g., `user-auth-20260131-143022`) +3. IF `--phase` was provided (no group) AND all filtered tasks share same `metadata.task_group` → `{task_group}-phase{N}-{YYYYMMDD}-{HHMMSS}` +4. IF `--phase` was provided (no group) → `phase{N}-{YYYYMMDD}-{HHMMSS}` (e.g., `phase1-20260131-143022`) +5. ELSE IF all open tasks (pending + in_progress) share the same non-empty `metadata.task_group` → `{task_group}-{YYYYMMDD}-{HHMMSS}` +6. ELSE → `exec-session-{YYYYMMDD}-{HHMMSS}` (e.g., `exec-session-20260131-143022`) + +Where `{N}` is the phase number (or `{N}-{M}` for multiple phases, e.g., `phase1-2`). ### Clean Stale Live Session @@ -221,24 +357,27 @@ This lock is automatically cleaned up in Step 8 when `__live_session__/` content Create `.claude/sessions/__live_session__/` (and `.claude/sessions/` parent if needed) with: 1. **`execution_plan.md`** - Save the execution plan displayed in Step 5 -2. **`execution_context.md`** - Initialize with standard template: +2. **`execution_context.md`** - Initialize with the 6-section structured template: ```markdown # Execution Context - ## Project Patterns - + ## Project Setup + + + ## File Patterns + + + ## Conventions + ## Key Decisions - + ## Known Issues - - - ## File Map - + ## Task History - + ``` 3. **`task_log.md`** - Initialize with table headers: ```markdown @@ -266,21 +405,36 @@ Create `.claude/sessions/__live_session__/` (and `.claude/sessions/` parent if n Read `.claude/sessions/__live_session__/execution_context.md` (created in Step 5.5). -If a prior execution session's context exists, look in `.claude/sessions/` for the most recent timestamped subfolder and merge relevant learnings (Project Patterns, Key Decisions, Known Issues, File Map) into the new execution context. +If a prior execution session's context exists, look in `.claude/sessions/` for the most recent timestamped subfolder and merge relevant learnings from the prior context's sections (Project Setup, File Patterns, Conventions, Key Decisions, Known Issues) into the corresponding sections of the new execution context. Do NOT merge prior Task History entries directly — they are handled by compaction below. + +### Cross-Session Context Compaction -### Context Compaction +After merging prior learnings, apply per-section compaction: -After merging prior learnings, check the Task History section. If it has 10 or more entries from merged sessions, compact older entries: +1. **For each of the first 5 sections** (Project Setup through Known Issues): if a section has 10 or more entries, summarize the older entries into a single paragraph at the top of that section, keeping the 5 most recent entries in full. -1. Keep the 5 most recent Task History entries in full -2. Summarize all older entries into a single "Prior Sessions Summary" paragraph at the top of the Task History section -3. Replace the old individual entries with this summary +2. **For Task History**: if the prior session's Task History has entries, summarize ALL prior session entries into a single "Prior Sessions Summary" paragraph at the top of the Task History section. Do not carry over individual entries from prior sessions. -This prevents the execution context from growing unbounded across multiple execution sessions. +This per-section compaction prevents any single section from growing unbounded across multiple execution sessions. ## Step 7: Execute Loop -Execute tasks in waves. No user interaction between waves. +Execute tasks in waves. No user interaction between waves except during Tier 3 retry escalation (see 7e.4). + +### 7-pre: Session Start Message + +Before entering the wave loop, emit a session start summary as text output visible to the human operator: + +``` +Execution plan: {total_tasks} tasks across {total_waves} waves (max {max_parallel} parallel) +``` + +Where: +- `{total_tasks}` = count of all tasks to be executed (pending, not blocked) +- `{total_waves}` = number of waves in the execution plan +- `{max_parallel}` = resolved max parallel value + +This gives the operator an immediate confirmation that execution has begun. ### 7a: Initialize Wave @@ -289,6 +443,66 @@ Execute tasks in waves. No user interaction between waves. 3. Take up to `max_parallel` tasks for this wave 4. If no unblocked tasks remain, exit the loop +### 7a.5: Pre-Wave File Conflict Detection + +Before launching agents, scan all wave tasks for file path conflicts to prevent concurrent agents from editing the same files. + +#### Purpose + +When two or more tasks in the same wave reference the same file, concurrent agents may overwrite each other's changes. This step detects such conflicts before launch and defers higher-ID conflicting tasks to the next wave. + +#### Procedure + +1. **Extract file references**: For each task in the wave, scan the task's `description` and acceptance criteria fields for file path references using these patterns: + - **Slash paths**: Any token containing `/` (e.g., `src/api/handler.ts`, `claude/sdd-tools/SKILL.md`) + - **Known extensions**: Tokens ending in `.md`, `.ts`, `.js`, `.json`, `.sh`, or `.py` (e.g., `SKILL.md`, `config.json`) + - **Glob patterns**: Tokens containing `*` or `?` with `/` or known extensions (e.g., `src/api/*.ts`, `tests/**/*.py`) + + When extracting, strip surrounding markdown formatting (backticks, bold markers, list prefixes) to get clean paths. + +2. **Normalize paths**: Convert all extracted paths to a consistent form: + - Remove leading `./` if present + - Collapse `//` to `/` + - Trim trailing whitespace + +3. **Detect conflicts**: Build a map of `{file_path → [task_ids]}`. A conflict exists when any file path maps to two or more task IDs. + + For glob pattern conflicts: + - Two glob patterns conflict if they could match the same file. Use conservative overlap detection: + - Globs sharing the same directory prefix and overlapping extensions conflict (e.g., `src/api/*.ts` and `src/api/handler.ts`) + - A concrete path conflicts with a glob if the path's directory starts with the glob's directory prefix (e.g., `src/api/handler.ts` conflicts with `src/api/*.ts`) + - When in doubt, treat ambiguous overlaps as conflicts (false positives are safer than false negatives) + +4. **Resolve conflicts**: For each conflicting file path: + - The task with the **lowest ID** stays in the current wave + - All higher-ID tasks referencing that file are **deferred** to the next wave by inserting an artificial dependency on the lowest-ID task + - If a task conflicts on multiple files, it is deferred if it loses on any of them + +5. **Handle all-conflict case**: If all tasks in the wave conflict on the same file(s), sequentialize them: keep only the lowest-ID task in this wave, defer all others. The deferred tasks will form subsequent sub-waves of one task each. + +6. **Log results**: Append a "Conflict Resolution" section to `execution_plan.md` using the read-modify-write pattern: + + ```markdown + ## Conflict Resolution — Wave {N} + + {If conflicts found:} + Detected {count} file conflict(s): + - `{file_path}`: Tasks [{id1}], [{id2}], [{id3}] + → [{id1}] stays (lowest ID), [{id2}] and [{id3}] deferred to next wave + + {If no conflicts:} + No file conflicts detected. Wave proceeds unchanged. + ``` + +7. **No conflicts**: If no conflicts are detected, proceed immediately with no overhead. Do not log anything to `execution_plan.md` for clean waves (skip step 6). + +#### Error Handling + +If the file path pattern matching fails (e.g., unexpected description format causes a parsing error): +- Log a warning: `WARNING: File conflict detection failed for wave {N} — proceeding without detection` +- Proceed with the wave as-is (no tasks deferred) +- Do not block execution due to detection failures + ### 7b: Snapshot Execution Context Read `.claude/sessions/__live_session__/execution_context.md` and hold it as the baseline for this wave. All agents in this wave will read from this same snapshot. This prevents concurrent agents from seeing partial context writes from sibling tasks. @@ -297,7 +511,12 @@ Read `.claude/sessions/__live_session__/execution_context.md` and hold it as the 1. Mark all wave tasks as `in_progress` via `TaskUpdate` 2. Record `wave_start_time` -3. Write the complete `progress.md` using Write (read-modify-write pattern): +3. **Emit "Starting Wave" message** as text output visible to the human operator: + ``` + Starting Wave {current_wave}/{total_waves}: {count} tasks... + ``` + Where `{count}` is the number of tasks in this wave. This is emitted before launching any agents, giving the operator a real-time progress indicator. +4. Write the complete `progress.md` using Write (read-modify-write pattern): ```markdown # Execution Progress Status: Executing @@ -313,7 +532,31 @@ Read `.claude/sessions/__live_session__/execution_context.md` and hold it as the ## Completed This Session {accumulated completed tasks from prior waves} ``` -4. Launch all wave agents simultaneously using **parallel Task tool calls in a single message turn** with `run_in_background: true`. +5. **Build upstream injection blocks (produces_for)**: Before launching agents, check if any completed tasks from prior waves have `produces_for` arrays that reference tasks in the current wave. For each wave task, follow the injection procedure defined in the "Upstream Prompt Injection (produces_for)" section above: + + a. Scan all completed tasks for `produces_for` entries containing this task's ID + b. If producers found, sort by task ID (ascending numeric order) + c. For each producer, build an injection block: + - **Producer succeeded** (result file exists): Read `result-task-{producer_id}.md` and format as: + ``` + ## UPSTREAM TASK OUTPUT (Task #{producer_id}: {producer_subject}) + {result file content} + --- + ``` + - **Producer failed** (result file missing or task status is FAIL): Format as: + ``` + ## UPSTREAM TASK #{producer_id} FAILED + Task: {producer_subject} + Status: FAIL + {failure summary from task_log.md if available, otherwise "No failure details available."} + --- + ``` + d. Log each injection: `Injecting upstream output from task #{producer_id} into task #{task_id}` + e. If no producers found for this task, skip injection (no overhead) + + Concatenate all injection blocks for a task into a single `{upstream_injection}` string. If empty, the `CONTEXT FROM COMPLETED DEPENDENCIES` section in the prompt template below is omitted entirely. + +6. Launch all wave agents simultaneously using **parallel Task tool calls in a single message turn** with `run_in_background: true`. **Record the background task_id mapping**: After the Task tool returns for each agent, record the mapping `{task_list_id → background_task_id}` from each response. The `background_task_id` (returned in the Task tool result when `run_in_background: true`) is needed later to call `TaskOutput` for process reaping and usage extraction. @@ -341,6 +584,10 @@ Task: - Spec Path: {spec_path} - Feature: {feature_name} + {If upstream_injection is non-empty:} + CONTEXT FROM COMPLETED DEPENDENCIES: + {upstream_injection} + CONCURRENT EXECUTION MODE Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md @@ -400,42 +647,88 @@ Task: **Important**: Always include the `CONCURRENT EXECUTION MODE` and `RESULT FILE PROTOCOL` sections regardless of `max_parallel` value. All agents write to per-task context files (`context-task-{id}.md`) and result files (`result-task-{id}.md`), and the orchestrator always performs the merge step in 7f. This unified path eliminates fragile direct writes to `execution_context.md`. -5. **Poll for completion**: After launching all background agents, poll for result files using the `poll-for-results.sh` script in a **multi-round pattern**. Each round invokes the script once via Bash; the script checks for result files every 15 seconds for up to 7 minutes then exits with a progress report. The orchestrator loops across rounds until all results are found or the cumulative timeout is reached. +7. **Detect completion (watch → poll fallback)**: After launching all background agents, detect result file completion using `watch-for-results.sh` as the primary mechanism, falling back to `poll-for-results.sh` (adaptive) if filesystem watch tools are unavailable. + +**IMPORTANT**: Always specify `timeout: 480000` (8 minutes) on each Bash invocation. The default Bash timeout of 2 minutes is NOT enough for completion detection. Both scripts handle their own internal timeout (default 45 minutes via `WATCH_TIMEOUT` / `POLL_TIMEOUT` environment variables). + +#### Primary: Filesystem Watch + +Launch `watch-for-results.sh` as a single Bash invocation (with `timeout: 480000`): + + ```bash + bash ${CLAUDE_PLUGIN_ROOT}/skills/execute-tasks/scripts/watch-for-results.sh \ + .claude/sessions/__live_session__ {expected_count} {task_id_1} {task_id_2} {task_id_3} + ``` + + Replace `{expected_count}` with the number of tasks in this wave and `{task_id_N}` with their actual task IDs. + +#### Interpreting Watch Output + +The watch script emits incremental progress lines on stdout. Parse each line as it appears: -**IMPORTANT**: Always specify `timeout: 480000` (8 minutes) on each Bash invocation. The default Bash timeout of 2 minutes is NOT enough for polling. + - **`RESULT_FOUND: result-task-{id}.md (N/M)`** — One result file detected. Log incremental progress (e.g., "Wave 2: result 3/5 found (task {id})"). + - **`ALL_DONE`** — All expected result files found. Proceed to 7d. + - **`TIMEOUT: Found N/M results`** — The watch timed out (exit code 1). Handle as wave timeout (see below). + - **`WATCHER_EXIT: Found N/M results`** — The underlying watcher process (fswatch/inotifywait) exited unexpectedly before all results were found (exit code 1). Fall back to polling for the remaining results. -**Poll round invocation** (via Bash tool with `timeout: 480000`): +#### Handling Watch Exit Codes + +After the Bash invocation completes, check the exit code: + + | Exit Code | Meaning | Action | + |-----------|---------|--------| + | **0** | All results found (`ALL_DONE` emitted) | Proceed to 7d for batch processing | + | **1** | Timeout or unexpected watcher exit | Check last output line: if `TIMEOUT:` → handle as wave timeout; if `WATCHER_EXIT:` → fall back to polling for remaining results | + | **2** | Neither `fswatch` nor `inotifywait` available | Fall back to polling immediately | + | **Bash timeout** | Bash tool `timeout: 480000` reached before script exited | Fall back to polling (re-invoke with remaining task IDs) | + +#### Fallback: Adaptive Polling + +If the watch script exits with code 2 (tools unavailable), the watcher exits unexpectedly (`WATCHER_EXIT`), or the Bash tool times out, fall back to `poll-for-results.sh`: ```bash bash ${CLAUDE_PLUGIN_ROOT}/skills/execute-tasks/scripts/poll-for-results.sh \ - .claude/sessions/__live_session__ {task_id_1} {task_id_2} {task_id_3} + .claude/sessions/__live_session__ {remaining_count} {remaining_task_ids...} ``` - Replace `{task_id_N}` with the actual task IDs for this wave. + When falling back after a partial watch (some results already found via `RESULT_FOUND:` lines), pass only the **remaining** task IDs and adjust `{remaining_count}` accordingly. + + Log the transition: `"Watch unavailable/failed — falling back to adaptive polling for {remaining_count} remaining results"` + +#### Polling Output Parsing + +The poll script uses the same output format as the watch script: -**Multi-round orchestrator loop** (Claude logic, not Bash): + - **`RESULT_FOUND: result-task-{id}.md (N/M)`** — Incremental detection. Log progress. + - **`ALL_DONE`** — All results found. Proceed to 7d. + - Exit code **0** — All results found. + - Exit code **1** — Timeout reached. Handle as wave timeout. -After launching background agents, repeat the following: -1. Run the poll script via Bash (with `timeout: 480000`), substituting this wave's task IDs -2. Parse the output: - - `POLL_RESULT: ALL_DONE` — all agents finished. Proceed to 7d. - - `POLL_RESULT: PENDING` — some agents still running. Log the progress line (e.g., "Wave 2 polling: 3/5 tasks complete, waiting on: 7 12"). Continue to the next poll round. - - Bash tool timeout error or no recognizable output — treat as incomplete round. Log "Poll round timed out, retrying..." and continue to the next poll round. -3. Track cumulative elapsed time across rounds. If cumulative time exceeds **45 minutes**, stop polling and report: +#### Multi-Round Fallback (Polling) + +If a single poll invocation times out (Bash tool timeout, not the script's internal timeout), re-invoke the poll script with only the remaining (undetected) task IDs. Track already-found results from `RESULT_FOUND:` lines across invocations to avoid counting duplicates. + + If cumulative elapsed time across all detection attempts (watch + poll rounds) exceeds **45 minutes**, stop and report: ``` TIMEOUT: Not all result files appeared within 45 minutes. Missing: {list of task IDs still without result files} ``` Then proceed to 7d, which handles missing result files via the TaskOutput fallback. -4. Between rounds, no additional sleep — the script itself includes sleep intervals. -**Note**: The 45-minute cumulative timeout is **per polling loop instance** (i.e., per wave). Each time the orchestrator starts polling for a new wave (Step 7c) or for retry agents (Step 7e), the cumulative timer resets to zero. This gives each wave a full 45-minute window for its agents to complete. +#### Wave Timeout Handling + +When either the watch or poll script signals timeout (exit code 1): +1. Parse the final output line for the count of found vs expected results +2. Log: `"Wave {N} timeout: {found}/{expected} results detected"` +3. Proceed to 7d — missing result files are handled via the TaskOutput fallback in 7d step 3 -After polling completes (all done or timeout), proceed to 7d for batch processing. +**Note**: The 45-minute cumulative timeout is **per completion detection instance** (i.e., per wave). Each time the orchestrator starts detection for a new wave (Step 7c) or for retry agents (Step 7e), the timeout budget resets. This gives each wave a full 45-minute window for its agents to complete. + +After detection completes (all done or timeout), proceed to 7d for batch processing. ### 7d: Process Results (Batch) -After polling completes, process all wave results in a single batch: +After detection completes (watch or poll), process all wave results in a single batch: 1. **Reap background agents and extract usage**: For each task in the wave, call `TaskOutput(task_id=, block=true, timeout=60000)` using the mapping recorded in 7c. This serves two purposes: - **Process reaping**: Terminates the background agent process (prevents lingering subagents) @@ -454,7 +747,7 @@ After polling completes, process all wave results in a single batch: - `## Files Modified` section → changed file list - `## Issues` section → failure details -3. **Handle missing result files** (agent crash recovery): If a result file is missing after polling: +3. **Handle missing result files** (agent crash recovery): If a result file is missing after detection: - Check if `context-task-{id}.md` exists (agent may have crashed between context and result write) - The `TaskOutput` call in step 1 already captured diagnostic output for the crashed agent - Treat as FAIL with the TaskOutput content as failure details @@ -482,37 +775,368 @@ After polling completes, process all wave results in a single batch: **Context append fallback**: If a result file is missing but `TaskOutput` contains a `LEARNINGS:` section, manually write those learnings to `.claude/sessions/__live_session__/context-task-{id}.md`. -### 7e: Within-Wave Retry - -After batch processing identifies failed tasks: - -1. Collect all failed tasks with retries remaining -2. For each retriable task: - - Read the failure details from `result-task-{id}.md` (Issues section and Verification section) - - Delete the old `result-task-{id}.md` file before re-launching - - Launch a new background agent (`run_in_background: true`) with failure context from the result file included in the prompt - - **Record the new `background_task_id`** from each Task tool response (same mapping as 7c) - - Update `progress.md` active task entry: `- [{id}] {subject} — Retrying ({n}/{max})` -3. If any retry agents were launched: - - Enter a new multi-round polling loop for the retry agents' result files (same `poll-for-results.sh` pattern as 7c step 5, with only the retry task IDs as arguments and `timeout: 480000` on each Bash invocation) - - After polling completes (all retry result files found or cumulative timeout reached), **reap retry agents**: call `TaskOutput` on each retry `background_task_id` to extract `duration_ms` and `total_tokens` (same pattern as 7d step 1). If `TaskOutput` times out, call `TaskStop` to force-terminate. - - Process retry results using the same batch approach as 7d (using the freshly extracted per-task duration and token values for task_log rows) - - Repeat 7e if any retries still have attempts remaining -4. If retries exhausted for a task: - - Leave task as `in_progress` - - Log final failure +### 7d-post: Emit Wave Completion Summary + +After processing all results in 7d (and before retries in 7e), emit a structured wave completion summary as text output visible to the human operator. This is the primary progress mechanism — wave-level granularity only, no per-task streaming during a wave. + +**Summary format:** + +``` +Wave {current_wave}/{total_waves} complete: {pass_count}/{total_count} tasks passed ({wave_duration}) + [{id1}] {subject1} — {STATUS} ({task_duration}, {task_tokens} tokens) + [{id2}] {subject2} — {STATUS} ({task_duration}, {task_tokens} tokens) + [{id3}] {subject3} — {STATUS} ({task_duration}, {task_tokens} tokens) +``` + +Where: +- `{current_wave}/{total_waves}` = wave number and total wave count from the execution plan +- `{pass_count}/{total_count}` = number of PASS tasks vs total tasks in this wave +- `{wave_duration}` = elapsed time since `wave_start_time` (recorded in 7c step 2), formatted as `{m}m {s}s` +- Per-task lines are indented with 2 spaces and include: + - `{id}` = task ID + - `{subject}` = task subject (truncated to 50 chars if needed) + - `{STATUS}` = PASS, PARTIAL, or FAIL + - `{task_duration}` = from TaskOutput metadata (step 7d.1), e.g., `1m 52s` + - `{task_tokens}` = from TaskOutput metadata (step 7d.1), formatted as compact (e.g., `48K`). If token count unavailable (TaskOutput timeout), omit the tokens portion: `— PASS (1m 52s)` + +**Token count formatting**: Format token counts compactly: +- Under 1,000 → exact number (e.g., `823`) +- 1,000-999,999 → `{N}K` (e.g., `48K` for 48,230) +- 1,000,000+ → `{N.N}M` (e.g., `1.2M`) + +**Wave with failures example:** + +``` +Wave 3/6 complete: 2/4 tasks passed (4m 12s) + [8] Implement API handler — PASS (2m 10s, 52K tokens) + [9] Create database schema — PASS (3m 01s, 67K tokens) + [10] Update routing config — FAIL (4m 12s, 71K tokens) + [11] Add validation middleware — PARTIAL (3m 45s, 59K tokens) +``` + +**Single-wave session**: Even if only one wave exists, the summary is still emitted (shows `Wave 1/1 complete: ...`). + +**Data source**: All data comes from the result file parsing (step 7d.2) and TaskOutput reaping (step 7d.1) already performed. No additional file I/O is required. + +### 7e: Within-Wave Retry (3-Tier Escalation) + +After batch processing identifies failed tasks, apply a progressive retry escalation strategy. Each task tracks its own `escalation_level` (1, 2, or 3), which resets to 0 for every new task. The escalation level determines how much additional help the retry agent receives. + +#### Escalation Tiers + +| Tier | Escalation Level | Strategy | User Interaction | +|------|-----------------|----------|------------------| +| **Retry #1** | 1 — Standard | Failure context from previous result file | None (autonomous) | +| **Retry #2** | 2 — Context Enrichment | Full `execution_context.md` + related task result files | None (autonomous) | +| **Retry #3** | 3 — User Escalation | Pause execution, present failure to user | AskUserQuestion with 4 options | + +#### 7e.1: Collect Failed Tasks + +1. Collect all failed tasks (FAIL or PARTIAL) from the current wave's batch processing (7d) +2. For each failed task, determine its current `escalation_level`: + - First failure → `escalation_level = 1` + - Second failure → `escalation_level = 2` + - Third failure → `escalation_level = 3` +3. Group tasks by escalation level for batch processing + +#### 7e.2: Tier 1 — Standard Retry (escalation_level = 1) + +For tasks at escalation level 1 (first retry): + +1. Read the failure details from `result-task-{id}.md` (Issues section and Verification section) +2. Delete the old `result-task-{id}.md` file before re-launching +3. Launch a new background agent (`run_in_background: true`) with failure context from the result file included in the prompt (existing retry prompt format from 7c) +4. **Record the new `background_task_id`** from each Task tool response (same mapping as 7c) +5. Update `progress.md` active task entry: `- [{id}] {subject} — Retrying (1/{max}) [Standard]` + +#### 7e.3: Tier 2 — Context Enrichment Retry (escalation_level = 2) + +For tasks at escalation level 2 (second retry): + +1. Read the failure details from `result-task-{id}.md` +2. **Gather enrichment context**: + - Read the full `.claude/sessions/__live_session__/execution_context.md` (not just the snapshot — the latest merged version) + - Collect `result-task-{id}.md` files from **related tasks**: tasks that share dependencies with the failing task (same `blockedBy` entries) or tasks from the same wave. Read up to 5 related result files to avoid prompt bloat. +3. Delete the old `result-task-{id}.md` file before re-launching +4. Launch a new background agent with **enriched prompt** that includes: + - All standard retry context (failure details, retry instructions) + - Additional section: `CONTEXT ENRICHMENT (Retry #2):` + ``` + CONTEXT ENRICHMENT (Retry #2): + The following additional context is provided because the standard retry failed. + + Full execution context: + --- + {full execution_context.md content} + --- + + Related task results: + --- + {content of related result-task-{id}.md files} + --- + ``` +5. **Record the new `background_task_id`** from each Task tool response +6. Update `progress.md` active task entry: `- [{id}] {subject} — Retrying (2/{max}) [Context Enrichment]` + +#### 7e.4: Tier 3 — User Escalation (escalation_level = 3) + +For tasks at escalation level 3 (third retry), **pause autonomous execution** and involve the user: + +1. Read the failure details from the most recent `result-task-{id}.md` +2. Present failure details to the user via `AskUserQuestion`: + +```yaml +questions: + - header: "Task Failed: [{id}] {subject}" + question: | + This task has failed 2 automated retries. Here are the failure details: + + Attempt 1 (Standard): {brief failure summary from attempt 1} + Attempt 2 (Context Enrichment): {brief failure summary from attempt 2} + + Issues: {issues section from latest result file} + + How would you like to proceed? + options: + - label: "Fix manually and continue" + description: "You will fix the issue externally. Execution resumes when you confirm." + - label: "Skip this task" + description: "Mark as FAIL in task_log.md and continue with remaining tasks." + - label: "Provide guidance" + description: "Give the agent specific guidance for one more retry attempt." + - label: "Abort session" + description: "Stop execution, clean up, and show partial summary." + multiSelect: false +``` + +3. **Handle user response**: + + **"Fix manually and continue"**: + - Present a follow-up `AskUserQuestion`: + ```yaml + questions: + - header: "Manual Fix: [{id}] {subject}" + question: "Make your changes externally, then confirm to continue execution." + options: + - label: "Done — continue execution" + description: "I've fixed the issue. Resume task execution." + - label: "Cancel — abort session" + description: "Abort the execution session." + multiSelect: false + ``` + - If user selects "Done": Mark the task as `completed` via `TaskUpdate`. Log in `task_log.md` with status `PASS (manual)`. Continue with remaining waves. + - If user selects "Cancel": Proceed to abort (same as "Abort session" below). + + **"Skip this task"**: + - Leave task as `in_progress` (not completed) + - Log in `task_log.md` with status `FAIL (skipped)` and escalation level 3 - Retain the result file for post-analysis + - Continue with remaining waves (other tasks may still be unblocked) + + **"Provide guidance"**: + - Present a follow-up `AskUserQuestion` to capture guidance text: + ```yaml + questions: + - header: "Guidance for [{id}] {subject}" + question: "Provide specific guidance for the retry agent. What should it try differently?" + allowFreeText: true + ``` + - Delete the old `result-task-{id}.md` file + - Launch a new background agent with **guidance-enriched prompt** that includes: + - All standard retry context (failure details) + - Full `execution_context.md` content (same as Tier 2) + - Additional section: `USER GUIDANCE (Retry #3):` + ``` + USER GUIDANCE (Retry #3): + The user has reviewed the failure and provided the following guidance: + --- + {user's guidance text} + --- + Apply this guidance when implementing the fix. This is your final automated attempt. + ``` + - **Record the new `background_task_id`** + - Update `progress.md`: `- [{id}] {subject} — Retrying (3/{max}) [User Guidance]` + - Detect completion using the same watch → poll fallback pattern + - Reap the agent via `TaskOutput` (same as 7d step 1) + - Process the result: + - If PASS: Task passes normally. Log success. + - If FAIL: **Re-present `AskUserQuestion`** with updated failure details (same 4 options as above, but with the new failure information). The user can choose to provide more guidance (which triggers another retry with the new guidance), fix manually, skip, or abort. This loop continues until the user selects an option other than "Provide guidance" or a guided retry succeeds. + + **"Abort session"**: + - Log all remaining in-progress tasks with status `FAIL (aborted)` in `task_log.md` + - Leave all in-progress tasks as `in_progress` (not completed, not reset to pending) + - Skip directly to Step 8 (Session Summary) to generate and display the partial summary + - The session summary should note: `Session aborted by user at Wave {N} after task [{id}] failed escalation` + +#### 7e.5: Retry Execution and Detection + +For all tiers that launch background agents (Tier 1, Tier 2, and Tier 3 "Provide guidance"): + +1. After launching retry agents, detect completion using the same **watch → poll fallback** pattern as 7c step 7, with only the retry task IDs and their count as arguments. Launch `watch-for-results.sh` first; if exit code 2 or watcher fails, fall back to `poll-for-results.sh`. Always use `timeout: 480000` on each Bash invocation. +2. After detection completes (all retry result files found or cumulative timeout reached), **reap retry agents**: call `TaskOutput` on each retry `background_task_id` to extract `duration_ms` and `total_tokens` (same pattern as 7d step 1). If `TaskOutput` times out, call `TaskStop` to force-terminate. +3. Process retry results using the same batch approach as 7d (using the freshly extracted per-task duration and token values for task_log rows) + +#### 7e.6: Post-Retry Processing + +After all retry tiers for the current wave are processed: + +1. Tasks that passed at any tier: Mark as `completed`, log success +2. Tasks still failing after Tier 1 or 2: Increment `escalation_level` and repeat 7e with the next tier +3. Tasks resolved by user action (manual fix, skip, abort): Already handled in 7e.4 +4. Update `task_log.md` with the final escalation level for each task: + ```markdown + | {id} | {subject} | {status} | {attempt}/{max} (T{escalation_level}) | {duration} | {tokens} | + ``` + The `(T{escalation_level})` suffix in the Attempts column indicates which tier was reached: `T1` = standard, `T2` = context enrichment, `T3` = user escalation. + +#### Escalation Flow Summary + +``` +Task fails (attempt 1) + -> Retry #1: Standard (failure context only) + -> PASS? Done. + -> FAIL? Continue to Retry #2 + +Task fails (attempt 2) + -> Retry #2: Context Enrichment (full context + related results) + -> PASS? Done. + -> FAIL? Continue to Retry #3 + +Task fails (attempt 3) + -> Retry #3: User Escalation (AskUserQuestion) + -> "Fix manually" -> user fixes -> mark complete -> continue + -> "Skip" -> mark FAIL (skipped) -> continue + -> "Provide guidance" -> retry with guidance + -> PASS? Done. + -> FAIL? Re-present AskUserQuestion (loop) + -> "Abort" -> partial summary -> end session +``` + +**Important**: Each task has an independent escalation path. If multiple tasks fail in the same wave, each gets its own escalation sequence. Tier 1 and Tier 2 retries for all tasks in a wave are batched together (launched and detected in parallel). Tier 3 (user escalation) is handled sequentially per task since it requires user interaction. ### 7f: Merge Context and Clean Up After Wave After ALL agents in the current wave have completed (including retries): -1. Read `.claude/sessions/__live_session__/execution_context.md` -2. Read all `context-task-{id}.md` files from `.claude/sessions/__live_session__/` in task ID order -3. Append each file's full content to the end of the `## Task History` section -4. Write the complete updated `execution_context.md` using Write -5. Delete the `context-task-{id}.md` files -6. **Clean up result files**: Delete `result-task-{id}.md` for PASS tasks. Retain `result-task-{id}.md` for FAIL tasks (available for post-session analysis in the archived session folder) +#### Section-Based Merge Procedure + +1. **Read current context**: Read `.claude/sessions/__live_session__/execution_context.md` +2. **Parse into sections**: Split the file on `## ` markers. Each `## {Header}` through the next `## ` (or EOF) is one section. Store as a map: `{header_name → list_of_entries}`. +3. **Read per-task files**: Read all `context-task-{id}.md` files from `.claude/sessions/__live_session__/` in task ID order. Parse each file into sections using the same `## ` splitting. +4. **Merge by section**: For each per-task file, for each section present in that file: + - Find the matching section header in `execution_context.md` + - Append the per-task entries under the matching section header + - If a per-task section header does not match any of the 6 defined headers, place its content under `## Key Decisions` with a note: `(from unrecognized section)` +5. **Deduplicate within sections**: After appending all per-task entries, deduplicate within each section: + - Compare entries by their full text (trimmed of leading/trailing whitespace) + - Keep the first occurrence, remove exact duplicates + - Near-duplicates (same content, different wording) are NOT deduplicated — only exact matches +6. **Write merged context**: Reassemble the full `execution_context.md` with all 6 section headers (in order) and Write the complete file + +#### Within-Session Compaction + +After the merge, check each section's entry count. If any section has 10 or more entries: + +1. **For sections 1-5** (Project Setup through Known Issues): Keep the 5 most recent entries in full. Summarize all older entries into a single paragraph at the top of the section. +2. **For Task History**: Keep the 10 most recent entries in full. Summarize older entries into a "Wave Summary" paragraph at the top of the section. + +This prevents individual sections from growing unbounded during a single execution session. + +#### Post-Merge Validation + +After compaction completes (or is skipped), validate the merged `execution_context.md` before proceeding to cleanup. This catches corruption or unbounded growth introduced during the merge. The validation leverages the 6-section structured schema for reliable header detection. + +**Validation checks** (run in order): + +1. **Header validation**: Verify all 6 required section headers are present in the file: + - `## Project Setup` + - `## File Patterns` + - `## Conventions` + - `## Key Decisions` + - `## Known Issues` + - `## Task History` + + Scan the file for lines matching `^## ` and compare against the canonical set. If any headers are missing, record them in `missing_headers`. + +2. **Malformed content detection**: Check for non-empty, non-comment content lines that appear before the first `## ` header (after the `# Execution Context` title line). These indicate malformed structure from a bad merge. Record the count as `orphaned_lines`. + +3. **Size check**: Count the total lines in the file. + - If >500 lines: record `size_level = warn` + - If >1000 lines: record `size_level = error` + - Otherwise: record `size_level = normal` + +**Validation result**: Store the outcome as: +``` +validation_status: OK | WARN | ERROR +missing_headers: [] (list of missing header names, empty if all present) +orphaned_lines: 0 (count of content lines outside any section) +total_lines: N +size_level: normal | warn | error +``` + +Determine `validation_status`: +- `OK` — all 6 headers present, 0 orphaned lines, size normal +- `WARN` — all headers present but size >500, or orphaned lines detected +- `ERROR` — any headers missing, or size >1000 + +**Auto-repair** (if `missing_headers` is non-empty): + +1. Re-read the current `execution_context.md` +2. For each missing header, insert it (with an empty line below) in the correct position according to the canonical section order: + - Place it immediately before the next section header that IS present + - If no subsequent headers exist, append it at the end of the file +3. Write the repaired file using Write +4. If the repair write succeeds, update `validation_status` to `REPAIRED` and log: + ``` + Context validation: auto-repaired missing headers: {list of re-inserted headers} + ``` +5. If the repair write fails, log the error in `task_log.md` and continue with best-effort context: + ``` + Append to task_log.md: | — | Context validation | ERROR | Wave {N} | — | Auto-repair failed: {error details} | + ``` + +**Force compaction** (if `size_level` is `error`, i.e., >1000 lines): + +After auto-repair (if needed), apply aggressive compaction to ALL sections: + +1. **Sections 1-5** (Project Setup through Known Issues): Keep the 3 most recent entries. Summarize all older entries into a single paragraph at the top of the section. +2. **Task History**: Keep the 5 most recent entries. Summarize all older entries into a single "Session Summary" paragraph at the top of the section. +3. Write the compacted file using Write +4. Re-check line count after compaction. If still >1000 lines, log a warning but proceed — the content is legitimately large: + ``` + Context validation: file still >1000 lines ({N}) after force compaction — content is legitimately large + ``` + +**Log validation results**: + +- If `validation_status` is `OK`: no logging needed, no `task_log.md` entry +- If any issues detected (`WARN`, `ERROR`, or `REPAIRED`): append a diagnostic row to `task_log.md` using the read-modify-write pattern: + ```markdown + | — | Context validation | {WARN/ERROR/REPAIRED} | Wave {N} | — | {summary} | + ``` + Where `{summary}` combines all findings, e.g.: `"Missing: ## File Patterns (repaired); 523 lines (warn); 3 orphaned lines"` + +**Include in wave completion summary**: After validation completes, append a `## Context Health` section to `progress.md` as part of the wave status update (same read-modify-write cycle as the 7d.6 batch update). Add it after the `## Completed This Session` entries: + +```markdown +## Context Health (Wave {N}) +- Headers: {6/6 | 5/6 — repaired | N/6 — repair failed} +- Size: {total_lines} lines {(OK) | (WARN: >500) | (ERROR: >1000, compacted)} +- Orphaned content: {0 lines | N lines flagged} +``` + +If validation status is `OK` (all headers present, no orphaned content, size normal), emit a compact form: +```markdown +## Context Health (Wave {N}) +- Status: OK +``` + +Each wave's Context Health section replaces the previous wave's section (only the latest wave's health is shown in `progress.md`). + +#### Cleanup + +7. Delete the `context-task-{id}.md` files +8. **Clean up result files**: Delete `result-task-{id}.md` for PASS tasks, **unless** the task has a `produces_for` field with entries pointing to not-yet-completed tasks (retain for upstream injection in later waves). Retain `result-task-{id}.md` for FAIL tasks (available for post-session analysis in the archived session folder). Delete retained `produces_for` result files only after all tasks listed in the producer's `produces_for` have completed. ### 7g: Rebuild Next Wave and Archive diff --git a/claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh b/claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh index b5e677c..f461a0f 100755 --- a/claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh +++ b/claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh @@ -1,60 +1,133 @@ #!/usr/bin/env bash -# poll-for-results.sh — Polls for task result files with progress output +# poll-for-results.sh — Adaptive polling for task result files # -# Usage: poll-for-results.sh [id2] [id3] ... -# Example: poll-for-results.sh .claude/sessions/__live_session__ 1 2 3 4 5 +# Usage: poll-for-results.sh [task_ids...] +# Example: poll-for-results.sh .claude/sessions/__live_session__ 5 101 102 103 104 105 # -# Checks for result-task-{id}.md files every INTERVAL seconds for up to -# ROUND_DURATION seconds. Exits with structured output: +# Polls for result-task-{id}.md files with adaptive intervals. Used as fallback +# when watch-for-results.sh reports tools unavailable (exit code 2). # -# POLL_RESULT: ALL_DONE — all result files found (exit 0) -# POLL_RESULT: PENDING — round timeout, lists pending IDs (exit 1) +# Exit codes: +# 0 - All expected results found +# 1 - Timeout reached # -# Environment variable overrides (for testing): -# POLL_ROUND_DURATION — seconds per round (default: 420 = 7 minutes) -# POLL_INTERVAL — seconds between checks (default: 15) +# Output (stdout): +# RESULT_FOUND: result-task-{id}.md (N/M) +# ALL_DONE +# +# Environment: +# POLL_START_INTERVAL - Starting interval in seconds (default: 5) +# POLL_MAX_INTERVAL - Maximum interval in seconds (default: 30) +# POLL_TIMEOUT - Cumulative timeout in seconds (default: 2700 = 45 min) set -euo pipefail if [ $# -lt 2 ]; then - echo "Usage: poll-for-results.sh [id2] ..." + echo "Usage: poll-for-results.sh [task_ids...]" exit 2 fi SESSION_DIR="$1" -shift -EXPECTED_IDS="$*" +EXPECTED_COUNT="$2" +shift 2 +TASK_IDS="$*" + +# Parse environment variables with defaults, falling back on invalid values +parse_positive_int() { + local val="$1" + local default="$2" + if [[ "$val" =~ ^[0-9]+$ ]] && [ "$val" -gt 0 ]; then + echo "$val" + else + echo "$default" + fi +} -ROUND_DURATION="${POLL_ROUND_DURATION:-420}" -INTERVAL="${POLL_INTERVAL:-15}" +START_INTERVAL=$(parse_positive_int "${POLL_START_INTERVAL:-5}" 5) +MAX_INTERVAL=$(parse_positive_int "${POLL_MAX_INTERVAL:-30}" 30) +TIMEOUT=$(parse_positive_int "${POLL_TIMEOUT:-2700}" 2700) + +INTERVAL="$START_INTERVAL" ELAPSED=0 +FOUND_COUNT=0 +PREV_FOUND=0 -while [ "$ELAPSED" -lt "$ROUND_DURATION" ]; do - DONE_COUNT=0 - PENDING="" - TOTAL=0 - - for ID in $EXPECTED_IDS; do - TOTAL=$((TOTAL + 1)) - if [ -f "$SESSION_DIR/result-task-$ID.md" ]; then - DONE_COUNT=$((DONE_COUNT + 1)) - else - PENDING="$PENDING $ID" - fi - done +# Build list of IDs to check; if task_ids provided use those, otherwise scan directory +check_results() { + FOUND_COUNT=0 + if [ -n "$TASK_IDS" ]; then + for ID in $TASK_IDS; do + if [ -f "$SESSION_DIR/result-task-$ID.md" ]; then + FOUND_COUNT=$((FOUND_COUNT + 1)) + fi + done + else + for f in "$SESSION_DIR"/result-task-*.md; do + [ -f "$f" ] || continue + FOUND_COUNT=$((FOUND_COUNT + 1)) + done + fi +} - if [ "$DONE_COUNT" -eq "$TOTAL" ]; then - echo "POLL_RESULT: ALL_DONE" - echo "Completed: $DONE_COUNT/$TOTAL" - exit 0 +# Track which results have been announced to avoid duplicates +declare -A ANNOUNCED 2>/dev/null || true + +emit_new_results_tracked() { + if [ -n "$TASK_IDS" ]; then + for ID in $TASK_IDS; do + if [ -f "$SESSION_DIR/result-task-$ID.md" ] && [ -z "${ANNOUNCED[$ID]:-}" ]; then + ANNOUNCED[$ID]=1 + echo "RESULT_FOUND: result-task-$ID.md ($FOUND_COUNT/$EXPECTED_COUNT)" + fi + done + else + for f in "$SESSION_DIR"/result-task-*.md; do + [ -f "$f" ] || continue + local BASENAME + BASENAME="$(basename "$f")" + if [ -z "${ANNOUNCED[$BASENAME]:-}" ]; then + ANNOUNCED[$BASENAME]=1 + echo "RESULT_FOUND: $BASENAME ($FOUND_COUNT/$EXPECTED_COUNT)" + fi + done fi +} + +# Initial check before entering the polling loop +check_results +PREV_FOUND=$FOUND_COUNT +emit_new_results_tracked +if [ "$FOUND_COUNT" -ge "$EXPECTED_COUNT" ]; then + echo "ALL_DONE" + exit 0 +fi + +# Adaptive polling loop +while [ "$ELAPSED" -lt "$TIMEOUT" ]; do sleep "$INTERVAL" ELAPSED=$((ELAPSED + INTERVAL)) + + PREV_FOUND=$FOUND_COUNT + check_results + emit_new_results_tracked + + if [ "$FOUND_COUNT" -ge "$EXPECTED_COUNT" ]; then + echo "ALL_DONE" + exit 0 + fi + + if [ "$FOUND_COUNT" -gt "$PREV_FOUND" ]; then + # New result found — reset interval to start + INTERVAL="$START_INTERVAL" + else + # No new results — increase interval by 5s up to max + INTERVAL=$((INTERVAL + 5)) + if [ "$INTERVAL" -gt "$MAX_INTERVAL" ]; then + INTERVAL="$MAX_INTERVAL" + fi + fi done -# Round ended without all files found -echo "POLL_RESULT: PENDING" -echo "Completed: $DONE_COUNT/$TOTAL" -echo "Waiting on:$PENDING" +# Timeout reached exit 1 diff --git a/claude/sdd-tools/skills/execute-tasks/scripts/tests/poll-for-results.bats b/claude/sdd-tools/skills/execute-tasks/scripts/tests/poll-for-results.bats new file mode 100644 index 0000000..3eda30a --- /dev/null +++ b/claude/sdd-tools/skills/execute-tasks/scripts/tests/poll-for-results.bats @@ -0,0 +1,238 @@ +#!/usr/bin/env bats +# Tests for poll-for-results.sh adaptive polling + +SCRIPT_DIR="$(cd "$(dirname "${BATS_TEST_FILENAME}")/.." && pwd)" +POLL_SCRIPT="$SCRIPT_DIR/poll-for-results.sh" + +setup() { + TEST_DIR="$(mktemp -d)" + export POLL_START_INTERVAL=1 + export POLL_MAX_INTERVAL=6 + export POLL_TIMEOUT=30 +} + +teardown() { + rm -rf "$TEST_DIR" + unset POLL_START_INTERVAL POLL_MAX_INTERVAL POLL_TIMEOUT 2>/dev/null || true +} + +# --- Adaptive interval increase --- + +@test "adaptive interval increase: intervals grow by 5s each poll with no results" { + # With POLL_START_INTERVAL=1 and MAX_INTERVAL=4, TIMEOUT=12: + # Sleeps: 1s, then 4s (1+5 capped to 4), then 4s, then 4s = 13 > 12, so ~3 sleeps = 9s + # Verify the script takes MORE than just 3x1s (would be 3s without increase) + # proving intervals are growing + export POLL_START_INTERVAL=1 + export POLL_MAX_INTERVAL=4 + export POLL_TIMEOUT=12 + + local start_time=$SECONDS + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 999 + local elapsed=$((SECONDS - start_time)) + + [ "$status" -eq 1 ] + # Total sleep should be around 9s (1+4+4), proving intervals grew beyond 1s + [ "$elapsed" -ge 7 ] + [ "$elapsed" -le 16 ] +} + +@test "adaptive interval increase: default progression 5s, 10s, 15s, 20s, 25s, 30s" { + # Verify the script code handles the default progression correctly + # We test with small values: START=1, and verify cap works + export POLL_START_INTERVAL=1 + export POLL_MAX_INTERVAL=4 + export POLL_TIMEOUT=8 + + local start_time=$SECONDS + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 999 + local elapsed=$((SECONDS - start_time)) + + [ "$status" -eq 1 ] + # Intervals: sleep 1, then 6 (1+5, but capped to 4), then 4, ... + # Actually: 1 + 4 = 5, then +4 = 9 > 8, so 2 sleeps total + # elapsed should be around 5s (1+4) + [ "$elapsed" -ge 4 ] + [ "$elapsed" -le 10 ] +} + +# --- Interval reset on result --- + +@test "interval reset: resets to start interval when new result found" { + export POLL_START_INTERVAL=1 + export POLL_MAX_INTERVAL=100 + export POLL_TIMEOUT=15 + + # Create first result after 2 seconds, second after 5 seconds + (sleep 2 && touch "$TEST_DIR/result-task-101.md") & + (sleep 5 && touch "$TEST_DIR/result-task-102.md") & + + run bash "$POLL_SCRIPT" "$TEST_DIR" 2 101 102 + + [ "$status" -eq 0 ] + [[ "$output" == *"RESULT_FOUND: result-task-101.md"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-102.md"* ]] + [[ "$output" == *"ALL_DONE"* ]] +} + +# --- Max interval cap --- + +@test "max interval cap: never exceeds POLL_MAX_INTERVAL" { + export POLL_START_INTERVAL=1 + export POLL_MAX_INTERVAL=2 + export POLL_TIMEOUT=8 + + local start_time=$SECONDS + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 999 + local elapsed=$((SECONDS - start_time)) + + [ "$status" -eq 1 ] + # Intervals: 1, then 6 (1+5) capped to 2, then 7 (2+5) capped to 2, ... + # Sleeps: 1 + 2 + 2 + 2 = 7, next would be 9 > 8 + # So ~4 sleeps, about 7 seconds + [ "$elapsed" -ge 6 ] + [ "$elapsed" -le 12 ] +} + +# --- Environment variable override --- + +@test "environment variable override: POLL_START_INTERVAL=2 starts at 2s" { + # Create a result after 3 seconds. With start=2: + # Sleep 2s (check -> not found), sleep 7s (2+5) -> but result appears at 3s + # After the 7s sleep (total elapsed ~9s), it will find the result. + # With start=1 instead of 2: Sleep 1s (not found), sleep 6s (1+5) -> finds at ~7s + # The test verifies: start=2 means first interval is 2, script works, and finds result. + export POLL_START_INTERVAL=2 + export POLL_MAX_INTERVAL=100 + export POLL_TIMEOUT=20 + + (sleep 3 && touch "$TEST_DIR/result-task-50.md") & + + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 50 + + [ "$status" -eq 0 ] + [[ "$output" == *"RESULT_FOUND: result-task-50.md"* ]] + [[ "$output" == *"ALL_DONE"* ]] +} + +@test "environment variable override: POLL_START_INTERVAL=10 starts at 10s" { + # Create a result immediately available. With start=10, the initial check + # should find it before entering the polling loop. + touch "$TEST_DIR/result-task-200.md" + + export POLL_START_INTERVAL=10 + export POLL_MAX_INTERVAL=100 + export POLL_TIMEOUT=30 + + local start_time=$SECONDS + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 200 + local elapsed=$((SECONDS - start_time)) + + [ "$status" -eq 0 ] + [[ "$output" == *"RESULT_FOUND: result-task-200.md"* ]] + [[ "$output" == *"ALL_DONE"* ]] + # Should be nearly instant since result exists before polling starts + [ "$elapsed" -le 2 ] +} + +@test "environment variable override: POLL_MAX_INTERVAL=15 caps correctly" { + export POLL_START_INTERVAL=1 + export POLL_MAX_INTERVAL=3 + export POLL_TIMEOUT=10 + + local start_time=$SECONDS + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 999 + local elapsed=$((SECONDS - start_time)) + + [ "$status" -eq 1 ] + # Intervals: 1, then 6 capped to 3, then 8 capped to 3, ... + # Sleeps: 1 + 3 + 3 + 3 = 10, done + [ "$elapsed" -ge 8 ] + [ "$elapsed" -le 15 ] +} + +@test "environment variable override: invalid values use defaults" { + export POLL_START_INTERVAL="abc" + export POLL_MAX_INTERVAL="-5" + export POLL_TIMEOUT="0" + + # With invalid values, defaults should be used: start=5, max=30, timeout=2700 + # This would take way too long for a test, so just verify the script starts + # We'll use a quick check: create all results immediately + touch "$TEST_DIR/result-task-101.md" + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 101 + + [ "$status" -eq 0 ] + [[ "$output" == *"ALL_DONE"* ]] +} + +# --- Timeout --- + +@test "timeout: exits with code 1 when timeout reached" { + export POLL_START_INTERVAL=1 + export POLL_MAX_INTERVAL=2 + export POLL_TIMEOUT=4 + + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 999 + + [ "$status" -eq 1 ] +} + +# --- All results on first poll --- + +@test "immediate ALL_DONE: all results found on first poll" { + touch "$TEST_DIR/result-task-101.md" + touch "$TEST_DIR/result-task-102.md" + touch "$TEST_DIR/result-task-103.md" + + local start_time=$SECONDS + run bash "$POLL_SCRIPT" "$TEST_DIR" 3 101 102 103 + local elapsed=$((SECONDS - start_time)) + + [ "$status" -eq 0 ] + [[ "$output" == *"ALL_DONE"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-101.md"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-102.md"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-103.md"* ]] + # Should be nearly instant + [ "$elapsed" -le 2 ] +} + +# --- Output format --- + +@test "output format: matches RESULT_FOUND pattern" { + touch "$TEST_DIR/result-task-42.md" + + run bash "$POLL_SCRIPT" "$TEST_DIR" 1 42 + + [ "$status" -eq 0 ] + [[ "${lines[0]}" == "RESULT_FOUND: result-task-42.md (1/1)" ]] + [[ "${lines[1]}" == "ALL_DONE" ]] +} + +@test "output format: shows correct N/M counts" { + touch "$TEST_DIR/result-task-1.md" + touch "$TEST_DIR/result-task-2.md" + + run bash "$POLL_SCRIPT" "$TEST_DIR" 3 1 2 3 + + [ "$status" -ne 0 ] || [[ "$output" == *"ALL_DONE"* ]] + # Should find 2 of 3 on first scan + [[ "$output" == *"RESULT_FOUND: result-task-1.md"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-2.md"* ]] +} + +# --- Usage --- + +@test "usage: exits with code 2 when no arguments" { + run bash "$POLL_SCRIPT" + + [ "$status" -eq 2 ] + [[ "$output" == *"Usage:"* ]] +} + +@test "usage: exits with code 2 when only session_dir" { + run bash "$POLL_SCRIPT" "$TEST_DIR" + + [ "$status" -eq 2 ] +} diff --git a/claude/sdd-tools/skills/execute-tasks/scripts/tests/watch-for-results.bats b/claude/sdd-tools/skills/execute-tasks/scripts/tests/watch-for-results.bats new file mode 100644 index 0000000..2deb57c --- /dev/null +++ b/claude/sdd-tools/skills/execute-tasks/scripts/tests/watch-for-results.bats @@ -0,0 +1,155 @@ +#!/usr/bin/env bats +# Tests for watch-for-results.sh event-driven result file detection + +SCRIPT_DIR="$(cd "$(dirname "${BATS_TEST_FILENAME}")/.." && pwd)" +WATCH_SCRIPT="$SCRIPT_DIR/watch-for-results.sh" + +setup() { + TEST_DIR="$(mktemp -d)" + export WATCH_TIMEOUT=10 +} + +teardown() { + rm -rf "$TEST_DIR" + unset WATCH_TIMEOUT 2>/dev/null || true +} + +# --- All results found --- + +@test "all results found: pre-existing files produce ALL_DONE and exit 0" { + touch "$TEST_DIR/result-task-1.md" + touch "$TEST_DIR/result-task-2.md" + touch "$TEST_DIR/result-task-3.md" + + run bash "$WATCH_SCRIPT" "$TEST_DIR" 3 1 2 3 + + [ "$status" -eq 0 ] + [[ "$output" == *"RESULT_FOUND: result-task-1.md"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-2.md"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-3.md"* ]] + [[ "$output" == *"ALL_DONE"* ]] +} + +@test "all results found: dynamically created files produce ALL_DONE and exit 0" { + # Create result files after a delay + (sleep 1 && touch "$TEST_DIR/result-task-1.md") & + (sleep 2 && touch "$TEST_DIR/result-task-2.md") & + + run bash "$WATCH_SCRIPT" "$TEST_DIR" 2 1 2 + + [ "$status" -eq 0 ] + [[ "$output" == *"RESULT_FOUND: result-task-1.md"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-2.md"* ]] + [[ "$output" == *"ALL_DONE"* ]] +} + +@test "all results found: output format matches RESULT_FOUND pattern with counts" { + touch "$TEST_DIR/result-task-42.md" + + run bash "$WATCH_SCRIPT" "$TEST_DIR" 1 42 + + [ "$status" -eq 0 ] + [[ "${lines[0]}" == "RESULT_FOUND: result-task-42.md (1/1)" ]] + [[ "${lines[1]}" == "ALL_DONE" ]] +} + +# --- Timeout with no files --- + +@test "timeout: exits with code 1 when no files created" { + export WATCH_TIMEOUT=3 + + run bash "$WATCH_SCRIPT" "$TEST_DIR" 2 1 2 + + [ "$status" -eq 1 ] + [[ "$output" == *"Found 0/2 results"* ]] +} + +# --- No fswatch available --- + +@test "no fswatch available: exits with code 2 when tools unavailable" { + # Override PATH to exclude fswatch and inotifywait + run env PATH=/usr/bin:/bin bash "$WATCH_SCRIPT" "$TEST_DIR" 3 1 2 3 + + [ "$status" -eq 2 ] + [[ "$output" == *"Neither fswatch nor inotifywait available"* ]] +} + +# --- Pre-existing results detected and counted --- + +@test "pre-existing results: counted before watch starts" { + touch "$TEST_DIR/result-task-1.md" + touch "$TEST_DIR/result-task-2.md" + + # Expect 3 but only 2 exist; create the third after a delay + (sleep 1 && touch "$TEST_DIR/result-task-3.md") & + + run bash "$WATCH_SCRIPT" "$TEST_DIR" 3 1 2 3 + + [ "$status" -eq 0 ] + [[ "$output" == *"RESULT_FOUND: result-task-1.md (1/3)"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-2.md (2/3)"* ]] + [[ "$output" == *"RESULT_FOUND: result-task-3.md (3/3)"* ]] + [[ "$output" == *"ALL_DONE"* ]] +} + +# --- Partial completion --- + +@test "partial completion: reports found results and exits code 1 on timeout" { + export WATCH_TIMEOUT=3 + touch "$TEST_DIR/result-task-1.md" + + run bash "$WATCH_SCRIPT" "$TEST_DIR" 3 1 2 3 + + [ "$status" -eq 1 ] + [[ "$output" == *"RESULT_FOUND: result-task-1.md (1/3)"* ]] + [[ "$output" == *"Found 1/3 results"* ]] +} + +# --- Non-result files ignored --- + +@test "filtering: ignores non-result files in session directory" { + # Create non-result files, then the actual result + (sleep 1 && touch "$TEST_DIR/context-task-1.md" && touch "$TEST_DIR/other.txt" && touch "$TEST_DIR/result-task-1.md.tmp" && sleep 0.5 && touch "$TEST_DIR/result-task-1.md") & + + run bash "$WATCH_SCRIPT" "$TEST_DIR" 1 1 + + [ "$status" -eq 0 ] + # Only result-task-1.md should appear in output (not context, txt, or tmp) + local result_count=0 + for line in "${lines[@]}"; do + if [[ "$line" == RESULT_FOUND:* ]]; then + result_count=$((result_count + 1)) + fi + done + [ "$result_count" -eq 1 ] + [[ "$output" == *"RESULT_FOUND: result-task-1.md (1/1)"* ]] + [[ "$output" == *"ALL_DONE"* ]] +} + +# --- Usage --- + +@test "usage: exits with code 2 when no arguments" { + run bash "$WATCH_SCRIPT" + + [ "$status" -eq 2 ] + [[ "$output" == *"Usage:"* ]] +} + +@test "usage: exits with code 2 when only session_dir" { + run bash "$WATCH_SCRIPT" "$TEST_DIR" + + [ "$status" -eq 2 ] + [[ "$output" == *"Usage:"* ]] +} + +# --- Watcher exit --- + +@test "watcher exit: exits with code 1 when fswatch exits unexpectedly" { + # Watch a non-existent directory to make fswatch fail immediately + export WATCH_TIMEOUT=5 + NONEXISTENT_DIR="$(mktemp -u)" + + run bash "$WATCH_SCRIPT" "$NONEXISTENT_DIR" 1 1 + + [ "$status" -eq 1 ] +} diff --git a/claude/sdd-tools/skills/execute-tasks/scripts/watch-for-results.sh b/claude/sdd-tools/skills/execute-tasks/scripts/watch-for-results.sh new file mode 100755 index 0000000..65373ed --- /dev/null +++ b/claude/sdd-tools/skills/execute-tasks/scripts/watch-for-results.sh @@ -0,0 +1,115 @@ +#!/usr/bin/env bash +# watch-for-results.sh — Event-driven result file detection using filesystem events +# +# Usage: watch-for-results.sh [task_ids...] +# Example: watch-for-results.sh .claude/sessions/__live_session__ 5 101 102 103 104 105 +# +# Watches for result-task-{id}.md files using fswatch (macOS) or inotifywait (Linux). +# Replaces fixed-interval polling as the primary completion mechanism. +# +# Exit codes: +# 0 - All expected results found +# 1 - Timeout reached or watcher exited unexpectedly +# 2 - Neither fswatch nor inotifywait available +# +# Environment: +# WATCH_TIMEOUT - Timeout in seconds (default: 2700 = 45 min) + +set -uo pipefail + +if [ $# -lt 2 ]; then + echo "Usage: watch-for-results.sh [task_ids...]" + exit 2 +fi + +SESSION_DIR="$1" +EXPECTED_COUNT="$2" +shift 2 +TASK_IDS="$*" + +TIMEOUT="${WATCH_TIMEOUT:-2700}" + +# Detect available filesystem watch tool +WATCH_TOOL="" +if command -v fswatch >/dev/null 2>&1; then + WATCH_TOOL="fswatch" +elif command -v inotifywait >/dev/null 2>&1; then + WATCH_TOOL="inotifywait" +else + echo "ERROR: Neither fswatch nor inotifywait available" + exit 2 +fi + +FOUND_COUNT=0 + +# Count pre-existing result files before starting watch +for f in "$SESSION_DIR"/result-task-*.md; do + [ -f "$f" ] || continue + BASENAME="$(basename "$f")" + if [[ "$BASENAME" =~ ^result-task-.*\.md$ ]]; then + FOUND_COUNT=$((FOUND_COUNT + 1)) + echo "RESULT_FOUND: $BASENAME ($FOUND_COUNT/$EXPECTED_COUNT)" + fi +done + +if [ "$FOUND_COUNT" -ge "$EXPECTED_COUNT" ]; then + echo "ALL_DONE" + exit 0 +fi + +# Set up FIFO for watcher output +FIFO=$(mktemp -u "${TMPDIR:-/tmp}/watch-results-XXXXXX") +mkfifo "$FIFO" + +# Track background PIDs for cleanup +WATCHER_PID="" +TIMER_PID="" + +# Marker file signals timeout (avoids signal delivery issues during blocking read) +TIMEOUT_MARKER=$(mktemp -u "${TMPDIR:-/tmp}/watch-timeout-XXXXXX") + +cleanup() { + [ -n "$WATCHER_PID" ] && kill "$WATCHER_PID" 2>/dev/null || true + [ -n "$TIMER_PID" ] && kill "$TIMER_PID" 2>/dev/null || true + rm -f "$FIFO" "$TIMEOUT_MARKER" + wait 2>/dev/null || true +} +trap cleanup EXIT + +# Launch filesystem watcher writing to FIFO +if [ "$WATCH_TOOL" = "fswatch" ]; then + fswatch --event Created "$SESSION_DIR" > "$FIFO" 2>/dev/null & + WATCHER_PID=$! +elif [ "$WATCH_TOOL" = "inotifywait" ]; then + inotifywait -m -e create --format '%f' "$SESSION_DIR" > "$FIFO" 2>/dev/null & + WATCHER_PID=$! +fi + +# Start timeout timer — creates marker file then kills watcher to unblock the read loop +(sleep "$TIMEOUT" && touch "$TIMEOUT_MARKER" && kill "$WATCHER_PID" 2>/dev/null) & +TIMER_PID=$! + +# Read watcher output from FIFO in main shell (preserves variable state) +while IFS= read -r LINE; do + # Extract just the filename (fswatch gives full path, inotifywait gives filename) + BASENAME="$(basename "$LINE")" + + # Only process result-task-*.md files (ignore temp files, non-result files) + if [[ "$BASENAME" =~ ^result-task-.*\.md$ ]]; then + FOUND_COUNT=$((FOUND_COUNT + 1)) + echo "RESULT_FOUND: $BASENAME ($FOUND_COUNT/$EXPECTED_COUNT)" + + if [ "$FOUND_COUNT" -ge "$EXPECTED_COUNT" ]; then + echo "ALL_DONE" + exit 0 + fi + fi +done < "$FIFO" + +# FIFO closed — check if it was a timeout or unexpected watcher exit +if [ -f "$TIMEOUT_MARKER" ]; then + echo "TIMEOUT: Found $FOUND_COUNT/$EXPECTED_COUNT results" +else + echo "WATCHER_EXIT: Found $FOUND_COUNT/$EXPECTED_COUNT results" +fi +exit 1 diff --git a/claude/sdd-tools/tests/fixtures/invalid-result-no-status.md b/claude/sdd-tools/tests/fixtures/invalid-result-no-status.md new file mode 100644 index 0000000..cd4743a --- /dev/null +++ b/claude/sdd-tools/tests/fixtures/invalid-result-no-status.md @@ -0,0 +1,10 @@ +# Task Result: [5] Some task + +## Summary +Did stuff. + +## Files Modified +- none + +## Context Contribution +None. diff --git a/claude/sdd-tools/tests/fixtures/invalid-result-no-summary.md b/claude/sdd-tools/tests/fixtures/invalid-result-no-summary.md new file mode 100644 index 0000000..5b0d54b --- /dev/null +++ b/claude/sdd-tools/tests/fixtures/invalid-result-no-summary.md @@ -0,0 +1,8 @@ +status: PASS +task_id: 11 + +## Files Modified +- none + +## Context Contribution +None. diff --git a/claude/sdd-tools/tests/fixtures/invalid-result-unknown-status.md b/claude/sdd-tools/tests/fixtures/invalid-result-unknown-status.md new file mode 100644 index 0000000..c3e4115 --- /dev/null +++ b/claude/sdd-tools/tests/fixtures/invalid-result-unknown-status.md @@ -0,0 +1,11 @@ +status: UNKNOWN +task_id: 20 + +## Summary +Done. + +## Files Modified +- none + +## Context Contribution +None. diff --git a/claude/sdd-tools/tests/fixtures/valid-result-fail.md b/claude/sdd-tools/tests/fixtures/valid-result-fail.md new file mode 100644 index 0000000..573fbe1 --- /dev/null +++ b/claude/sdd-tools/tests/fixtures/valid-result-fail.md @@ -0,0 +1,15 @@ +status: FAIL +task_id: 10 +duration: 0m 45s + +## Summary +Failed to implement due to missing dependency. + +## Files Modified +- none + +## Context Contribution +None. + +## Verification +Functional: 1/3 diff --git a/claude/sdd-tools/tests/fixtures/valid-result-pass.md b/claude/sdd-tools/tests/fixtures/valid-result-pass.md new file mode 100644 index 0000000..db475d4 --- /dev/null +++ b/claude/sdd-tools/tests/fixtures/valid-result-pass.md @@ -0,0 +1,15 @@ +status: PASS +task_id: 42 +duration: 1m 30s + +## Summary +Implemented the feature successfully. + +## Files Modified +- src/foo.ts -- added new function + +## Context Contribution +Discovered that the project uses ESM imports. + +## Verification +Functional: 3/3, Edge Cases: 2/2, Tests: 5/5 (0 failures) diff --git a/docs/index.md b/docs/index.md index d72bf8e..9bb2427 100644 --- a/docs/index.md +++ b/docs/index.md @@ -73,7 +73,7 @@ Agent Alchemy is in active development. Current plugin versions: |--------|---------|--------| | Core Tools | 0.2.1 | Stable | | Dev Tools | 0.3.1 | Stable | -| SDD Tools | 0.2.1 | Stable | +| SDD Tools | 0.3.1 | Stable | | TDD Tools | 0.2.0 | Stable | | Git Tools | 0.1.0 | Stable | | Plugin Tools | 0.1.1 | Stable | diff --git a/docs/plugins/index.md b/docs/plugins/index.md index 48581a2..3a65059 100644 --- a/docs/plugins/index.md +++ b/docs/plugins/index.md @@ -8,7 +8,7 @@ Agent Alchemy extends Claude Code through six plugin groups, each targeting a di |--------|-------|--------|--------|---------| | [Core Tools](core-tools.md) | Codebase analysis and exploration | 5 | 3 | 0.2.1 | | [Dev Tools](dev-tools.md) | Feature development, review, docs | 9 | 4 | 0.3.1 | -| [SDD Tools](sdd-tools.md) | Spec-Driven Development pipeline | 4 | 4 | 0.2.1 | +| [SDD Tools](sdd-tools.md) | Spec-Driven Development pipeline | 4 | 4 | 0.3.1 | | [TDD Tools](tdd-tools.md) | Test-Driven Development workflows | 5 | 3 | 0.2.0 | | [Git Tools](git-tools.md) | Git commit automation | 1 | 0 | 0.1.0 | | [Plugin Tools](plugin-tools.md) | Plugin porting and ecosystem health | 5 | 2 | 0.1.1 | diff --git a/docs/plugins/sdd-tools.md b/docs/plugins/sdd-tools.md index 018eaf9..47887ba 100644 --- a/docs/plugins/sdd-tools.md +++ b/docs/plugins/sdd-tools.md @@ -2,7 +2,7 @@ Spec-Driven Development (SDD) Tools is the core workflow engine of Agent Alchemy. It provides a structured pipeline that transforms ideas into specifications, decomposes specifications into executable tasks, and runs autonomous implementation with wave-based parallelism. -**Plugin:** `agent-alchemy-sdd-tools` | **Version:** 0.2.1 | **Skills:** 4 | **Agents:** 4 +**Plugin:** `agent-alchemy-sdd-tools` | **Version:** 0.3.1 | **Skills:** 4 | **Agents:** 4 !!! abstract "Deep Dive Available" For a comprehensive walkthrough of the SDD pipeline — including end-to-end workflow examples, data flow diagrams, execution context sharing, and architectural deep-dives into each skill — see the [SDD Tools Deep Dive](sdd-tools-deep-dive.md). diff --git a/internal/docs/sdd-orchestration-deep-dive-2026-02-22.md b/internal/docs/sdd-orchestration-deep-dive-2026-02-22.md new file mode 100644 index 0000000..6f456d2 --- /dev/null +++ b/internal/docs/sdd-orchestration-deep-dive-2026-02-22.md @@ -0,0 +1,947 @@ +# SDD Orchestration Engine Deep-Dive + +> Technical reference for understanding and modifying the `execute-tasks` orchestration engine. +> Generated: 2026-02-22 | Source: `claude/sdd-tools/skills/execute-tasks/` + +## Table of Contents + +- [1. Overview](#1-overview) +- [2. 10-Step Orchestration Loop](#2-10-step-orchestration-loop) +- [3. Wave-Based Parallelism](#3-wave-based-parallelism) +- [4. Pre-Wave File Conflict Detection](#4-pre-wave-file-conflict-detection) +- [5. Upstream Injection (produces_for)](#5-upstream-injection-produces_for) +- [6. Completion Detection](#6-completion-detection) +- [7. Batch Result Processing](#7-batch-result-processing) +- [8. 3-Tier Retry Escalation](#8-3-tier-retry-escalation) +- [9. Context Merge Protocol](#9-context-merge-protocol) +- [10. Session Management](#10-session-management) +- [11. Hook Integration](#11-hook-integration) +- [12. Key Diagrams](#12-key-diagrams) + +--- + +## 1. Overview + +The SDD orchestration engine is the runtime heart of the Spec-Driven Development pipeline. It sits at the end of the artifact chain: + +``` +/create-spec → spec markdown → /create-tasks → task JSON → /execute-tasks → code + session logs +``` + +The engine's job: take a set of tasks with dependency relationships, execute them autonomously via parallel agent teams, and produce working code with full session traceability. + +### Key Design Goals + +| Goal | How It's Achieved | +|------|-------------------| +| **Autonomous execution** | After user confirms the plan, no prompts except Tier 3 retry escalation | +| **Wave-based parallelism** | Topological sort groups tasks by dependency level; up to N agents per wave | +| **Context isolation** | Per-task context/result files prevent write contention between concurrent agents | +| **Progressive learning** | Shared `execution_context.md` merges learnings between waves | +| **Resilient failure handling** | 3-tier retry escalation with user escalation as final safety net | +| **Minimal context consumption** | Result file protocol (~18 lines per task) instead of full agent output (~100+ lines) | + +### File Inventory + +| File | Lines | Role | +|------|-------|------| +| `skills/execute-tasks/SKILL.md` | 293 | Skill entry point — workflow overview, key behaviors, examples | +| `skills/execute-tasks/references/orchestration.md` | ~1,235 | 10-step orchestration loop with full procedures | +| `skills/execute-tasks/references/execution-workflow.md` | 381 | 4-phase task executor workflow (documentation-only) | +| `skills/execute-tasks/references/verification-patterns.md` | 256 | Task classification, criterion verification, pass/fail rules | +| `skills/execute-tasks/scripts/watch-for-results.sh` | 115 | Event-driven completion detection (fswatch/inotifywait) | +| `skills/execute-tasks/scripts/poll-for-results.sh` | 133 | Adaptive polling fallback for completion detection | +| `agents/task-executor.md` | 414 | Opus-tier agent with embedded 4-phase workflow | +| `hooks/auto-approve-session.sh` | 75 | PreToolUse hook for session file auto-approval | +| `hooks/validate-result.sh` | 100 | PostToolUse hook for result file format validation | +| `hooks/hooks.json` | 30 | Hook registration (PreToolUse + PostToolUse) | + +--- + +## 2. 10-Step Orchestration Loop + +The orchestrator executes a deterministic 10-step loop. Each step has clear inputs, outputs, and exit conditions. + +### Step-by-Step Summary + +| Step | Name | Inputs | Outputs | Can Exit? | +|------|------|--------|---------|-----------| +| **1** | Load Task List | `TaskList`, `--task-group`, `--phase` | Filtered task set | Yes (no tasks match) | +| **2** | Validate State | Task set | Validation result | Yes (empty, all done, circular deps) | +| **3** | Build Execution Plan | Task dependencies, `max_parallel` | Wave assignment, priority ordering | No | +| **4** | Check Settings | `.claude/agent-alchemy.local.md` | Execution preferences | No | +| **5** | Present Plan & Confirm | Execution plan | User confirmation | Yes (user cancels) | +| **5.5** | Initialize Execution Dir | `task_execution_id` | Session directory + files | No | +| **6** | Initialize Execution Context | Prior session context | Seeded `execution_context.md` | No | +| **7** | Execute Loop | Waves of tasks | Completed tasks + session artifacts | No (loops until done) | +| **8** | Session Summary | Task log, progress | Summary display + archive | No | +| **9** | Update CLAUDE.md | Execution context | CLAUDE.md edits (if warranted) | No | + +### Step 1: Load Task List + +Retrieves all tasks via `TaskList` and applies up to three filters in sequence: + +1. **`--task-group`** — matches `metadata.task_group` +2. **`--phase`** — matches `metadata.spec_phase` (comma-separated integers, e.g., `--phase 1,2`) +3. **`task-id`** — single task mode + +Tasks without `spec_phase` metadata are excluded when `--phase` is active. If no tasks match after filtering, the orchestrator reports available phases and stops. + +### Step 2: Validate State + +Catches edge cases before any work happens: + +- **Empty task list** — suggests using `/create-tasks` +- **All completed** — reports summary +- **Specific task blocked** — reports blockers +- **No unblocked tasks** — reports blocking chains, detects circular dependencies + +### Step 3: Build Execution Plan + +Three sub-steps: + +**3a: Resolve Max Parallel** — Precedence: CLI `--max-parallel` > `.claude/agent-alchemy.local.md` setting > default 5. + +**3b-3c: Topological Wave Assignment** — Tasks assigned to waves by dependency depth: +- Wave 1: no dependencies +- Wave 2: depends only on Wave 1 +- Wave N: depends only on Wave 1..N-1 + +**3d: Within-Wave Sort** — Priority ordering with tie-breaking: +1. `critical` > `high` > `medium` > `low` > unprioritized +2. Ties broken by "unblocks most others" (task appearing in most `blockedBy` lists) +3. Waves exceeding `max_parallel` are split into sub-waves + +**3e: Circular Dependency Detection** — Any tasks unassigned after topological sort form a cycle. The orchestrator breaks at the "weakest link" (fewest blockers). + +### Step 4: Check Settings + +Reads optional `.claude/agent-alchemy.local.md` for user preferences. Non-blocking — proceeds without settings if file missing. + +### Step 5: Present Plan & Confirm + +Displays a formatted execution plan banner with task counts, wave breakdown, blocked tasks, and completed count. Uses `AskUserQuestion` for confirmation — "Cancel" stops without modifying any tasks. + +### Step 5.5: Initialize Execution Directory + +The most complex initialization step, handling several concerns: + +**Session ID Generation** — Multi-tier resolution: + +| Priority | Condition | Format | +|----------|-----------|--------| +| 1 | `--task-group` + `--phase` | `{group}-phase{N}-{YYYYMMDD}-{HHMMSS}` | +| 2 | `--task-group` only | `{group}-{YYYYMMDD}-{HHMMSS}` | +| 3 | `--phase` only + shared group | `{group}-phase{N}-{YYYYMMDD}-{HHMMSS}` | +| 4 | `--phase` only | `phase{N}-{YYYYMMDD}-{HHMMSS}` | +| 5 | All tasks share group | `{group}-{YYYYMMDD}-{HHMMSS}` | +| 6 | Default | `exec-session-{YYYYMMDD}-{HHMMSS}` | + +**Stale Session Cleanup** — Archives leftover `__live_session__/` files to `interrupted-{timestamp}/` and resets `in_progress` tasks to `pending`. + +**Concurrency Guard** — `.lock` file prevents concurrent execution. Lock age > 4 hours = stale (auto-removed). Lock age < 4 hours = prompts user to force-start or cancel. + +**Session Files Created:** + +| File | Purpose | +|------|---------| +| `execution_plan.md` | Saved plan from Step 5 | +| `execution_context.md` | 6-section structured template | +| `task_log.md` | Table with Task ID, Subject, Status, Attempts, Duration, Token Usage | +| `progress.md` | Real-time status (Active Tasks, Completed This Session) | +| `tasks/` | Subdirectory for archived completed task JSONs | +| `execution_pointer.md` | Created at `~/.claude/tasks/{id}/` with absolute path to session | + +### Step 6: Initialize Execution Context + +Seeds `execution_context.md` with the 6-section template. If a prior session exists (most recent timestamped subfolder in `.claude/sessions/`), merges relevant learnings from sections 1-5. Applies cross-session compaction: sections with 10+ entries get summarized; prior Task History entries are condensed to a single summary paragraph. + +### Step 7: Execute Loop + +The core engine — detailed in sections 3-9 below. + +### Step 8: Session Summary + +Displays formatted execution summary with pass/fail counts, total execution time, token usage, failed task list, and newly unblocked tasks. Archives the session from `__live_session__/` to `.claude/sessions/{task_execution_id}/`. + +### Step 9: Update CLAUDE.md + +Reviews execution context for project-wide changes (new patterns, dependencies, commands, structure changes). Makes targeted edits only if meaningful changes occurred. + +--- + +## 3. Wave-Based Parallelism + +### Conceptual Model + +The orchestrator treats task execution like a build system: tasks form a directed acyclic graph (DAG), and waves represent topological sort levels. All tasks at the same level can run in parallel because they have no mutual dependencies. + +``` +Wave 1: [Task A] [Task B] [Task C] ← no dependencies, run in parallel + ↓ ↓ +Wave 2: [Task D] [Task E] ← depend on Wave 1 tasks + ↓ +Wave 3: [Task F] ← depends on Wave 2 +``` + +### Wave Assignment Algorithm + +1. Build dependency graph from `blockedBy` relationships +2. Assign tasks to waves using topological levels: + - Wave 1 = tasks with no `blockedBy` entries + - Wave N = tasks whose ALL `blockedBy` entries are in waves 1..N-1 +3. Sort within each wave by priority (critical > high > medium > low > unprioritized) +4. Break priority ties by "unblocks most others" — tasks appearing in the most `blockedBy` lists of other tasks run first +5. If wave size exceeds `max_parallel`, split into sub-waves preserving priority order + +### Dynamic Unblocking + +After each wave completes, the orchestrator refreshes the full task state via `TaskList`. Newly unblocked tasks (all `blockedBy` dependencies now completed) form the next wave. This dynamic approach handles cases where: + +- Tasks were deferred by file conflict detection +- Retry success unblocks downstream tasks +- User manual fixes via Tier 3 escalation unblock dependents + +### Max Parallel Capping + +The `max_parallel` setting (default 5) caps concurrent agents per wave. When a wave has more tasks than the cap, it's split into sequential sub-waves. Setting `max_parallel=1` forces fully sequential execution. + +--- + +## 4. Pre-Wave File Conflict Detection + +### Problem + +When two tasks in the same wave modify the same file, concurrent agents may overwrite each other's changes. This is especially common with configuration files, shared modules, or test fixtures. + +### Detection Procedure (Step 7a.5) + +**1. Extract file references** from task descriptions and acceptance criteria using three pattern types: + +| Pattern Type | Example | Detection Rule | +|-------------|---------|----------------| +| Slash paths | `src/api/handler.ts` | Token containing `/` | +| Known extensions | `SKILL.md`, `config.json` | Token ending in `.md`, `.ts`, `.js`, `.json`, `.sh`, `.py` | +| Glob patterns | `src/api/*.ts` | Token with `*` or `?` plus `/` or known extension | + +Surrounding markdown formatting (backticks, bold, list prefixes) is stripped to get clean paths. + +**2. Normalize paths** — Remove leading `./`, collapse `//`, trim trailing whitespace. + +**3. Detect conflicts** — Build a map of `{file_path → [task_ids]}`. Conflict = any path mapping to 2+ task IDs. For globs, conservative overlap detection: shared directory prefix + overlapping extensions = conflict. + +**4. Resolve conflicts:** +- Lowest-ID task stays in current wave +- Higher-ID tasks are deferred to next wave via artificial dependency +- If task conflicts on multiple files, deferred if it loses on any + +**5. All-conflict case** — If all tasks conflict, sequentialize: keep only lowest-ID, defer rest. + +**6. Logging** — Conflicts logged to `execution_plan.md`. Clean waves (no conflicts) skip logging entirely. + +### Error Handling + +If file path parsing fails, a warning is logged and the wave proceeds without deferral. Detection failures never block execution. + +--- + +## 5. Upstream Injection (produces_for) + +### Problem + +Wave-granular context merging (via `execution_context.md`) provides general learnings but lacks specific producer-consumer context. When Task B directly consumes Task A's output (e.g., an API endpoint consuming a data model), the agent needs the producer's specific result data, not just summarized learnings. + +### How It Works + +**Task JSON Extension:** + +```json +{ + "id": "5", + "subject": "Implement API handler", + "produces_for": ["8", "12"], + "blockedBy": ["3"] +} +``` + +The `produces_for` field is set during `/create-tasks` Phase 7 (Detect Producer-Consumer Relationships) and declares which downstream tasks consume this task's output. + +### Injection Procedure (Step 7c) + +Before launching wave agents, the orchestrator: + +1. **Scans completed tasks** for `produces_for` entries referencing current wave tasks +2. **Reads producer result files** (`result-task-{producer_id}.md`) +3. **Builds injection blocks:** + + For successful producers: + ```markdown + ## UPSTREAM TASK OUTPUT (Task #5: Implement API handler) + {result file content} + --- + ``` + + For failed producers: + ```markdown + ## UPSTREAM TASK #5 FAILED + Task: Implement API handler + Status: FAIL + {failure summary from task_log.md} + --- + ``` + +4. **Injects into agent prompt** after task description, before `CONCURRENT EXECUTION MODE` section +5. **Logs each injection**: `Injecting upstream output from task #5 into task #8` + +### Retention Rules + +Producer result files with `produces_for` entries pointing to not-yet-completed tasks are **retained** during post-wave cleanup (same rule as FAIL result files). Deleted only after all listed consumers complete. + +### No-op Optimization + +If no tasks in the set have `produces_for` fields, the injection procedure is skipped entirely — zero overhead. + +--- + +## 6. Completion Detection + +### Two-Tier Strategy + +The orchestrator uses event-driven filesystem watching as primary, with adaptive polling as fallback: + +``` +┌──────────────────┐ exit 2 ┌──────────────────┐ +│ watch-for-results │ ──────────────> │ poll-for-results │ +│ (fswatch/inotify)│ │ (adaptive 5s-30s) │ +└──────────────────┘ └──────────────────┘ + exit 0: ALL_DONE exit 0: ALL_DONE + exit 1: TIMEOUT/WATCHER_EXIT exit 1: TIMEOUT + exit 2: tools unavailable +``` + +### Primary: watch-for-results.sh + +**File:** `skills/execute-tasks/scripts/watch-for-results.sh` (115 lines) + +Uses `fswatch` (macOS) or `inotifywait` (Linux) to watch the session directory for file creation events. Key behaviors: + +| Behavior | Implementation | +|----------|----------------| +| Tool detection | Checks `fswatch` first, then `inotifywait`; exits 2 if neither found | +| Pre-existing files | Scans for result files before starting watch (handles fast agents) | +| Signaling | FIFO pipe from watcher process to main loop | +| File filtering | Only processes files matching `result-task-*.md` pattern | +| Timeout | Configurable via `WATCH_TIMEOUT` env var (default 45 min). Uses marker file + kill to unblock the read loop | +| Cleanup | Trap handler kills watcher + timer PIDs, removes FIFO and marker | + +**Output protocol:** + +| Line | Meaning | +|------|---------| +| `RESULT_FOUND: result-task-{id}.md (N/M)` | Incremental detection | +| `ALL_DONE` | All expected results found | +| `TIMEOUT: Found N/M results` | Watch timed out | +| `WATCHER_EXIT: Found N/M results` | Watcher process exited unexpectedly | + +### Fallback: poll-for-results.sh + +**File:** `skills/execute-tasks/scripts/poll-for-results.sh` (133 lines) + +Adaptive interval polling when filesystem watch tools are unavailable. Key behaviors: + +| Behavior | Implementation | +|----------|----------------| +| Starting interval | 5 seconds (configurable via `POLL_START_INTERVAL`) | +| Max interval | 30 seconds (configurable via `POLL_MAX_INTERVAL`) | +| Adaptation | New result found → reset to start interval. No new results → increase by 5s | +| Dedup | Associative array tracks announced results to prevent duplicates | +| Timeout | 45 minutes cumulative (configurable via `POLL_TIMEOUT`) | +| Task ID tracking | When specific IDs provided, checks only those files (more efficient) | + +### Orchestrator Integration + +The orchestrator always specifies `timeout: 480000` (8 minutes) on Bash invocations. Both scripts handle their own internal timeout (45 min default). The 8-minute Bash timeout prevents the orchestrator from blocking indefinitely on a single detection round. + +**Multi-round fallback:** If a poll invocation hits the Bash timeout, the orchestrator re-invokes with only remaining (undetected) task IDs. Cumulative 45-minute ceiling per wave. + +--- + +## 7. Batch Result Processing + +### Processing Pipeline (Step 7d) + +After completion detection signals all results found (or timeout), the orchestrator processes results in a single batch per wave: + +**1. Reap background agents** — For each task, calls `TaskOutput(task_id=, block=true, timeout=60000)`: +- Terminates the background agent process (prevents lingering subagents) +- Extracts `duration_ms` and `total_tokens` from metadata +- If `TaskOutput` times out: calls `TaskStop` to force-kill, sets duration/tokens to "N/A" + +**2. Read result files** — Parses each `result-task-{id}.md`: +- `status` line → PASS, PARTIAL, or FAIL +- `attempt` line → attempt number +- `## Verification` → criterion pass counts +- `## Files Modified` → changed file list +- `## Issues` → failure details + +**3. Handle missing result files** (agent crash recovery): +- Checks if `context-task-{id}.md` exists (agent may have crashed between writes) +- Uses `TaskOutput` content as diagnostic +- Treats as FAIL + +**4. Batch update session files** — Single read-modify-write cycle per file: +- `task_log.md`: Append all wave rows at once +- `progress.md`: Move completed tasks from Active to Completed + +### Duration/Token Formatting + +| Duration | Format | +|----------|--------| +| < 60 seconds | `{s}s` | +| < 60 minutes | `{m}m {s}s` | +| ≥ 60 minutes | `{h}h {m}m {s}s` | + +| Token Count | Format | +|-------------|--------| +| < 1,000 | Exact (e.g., `823`) | +| 1K–999K | `{N}K` (e.g., `48K`) | +| ≥ 1M | `{N.N}M` (e.g., `1.2M`) | + +### Wave Completion Summary (Step 7d-post) + +After processing, the orchestrator emits a human-readable wave summary: + +``` +Wave 3/6 complete: 2/4 tasks passed (4m 12s) + [8] Implement API handler — PASS (2m 10s, 52K tokens) + [9] Create database schema — PASS (3m 01s, 67K tokens) + [10] Update routing config — FAIL (4m 12s, 71K tokens) + [11] Add validation middleware — PARTIAL (3m 45s, 59K tokens) +``` + +This is the **primary progress mechanism** — wave-level granularity only, no per-task streaming during a wave. + +--- + +## 8. 3-Tier Retry Escalation + +### Overview + +Failed tasks progress through three escalation tiers. Each task tracks its own `escalation_level` independently. + +| Tier | Level | Strategy | User Interaction | +|------|-------|----------|------------------| +| 1 | Standard | Failure context from previous result | None (autonomous) | +| 2 | Context Enrichment | Full execution context + related task results | None (autonomous) | +| 3 | User Escalation | Pause execution, present failure to user | `AskUserQuestion` with 4 options | + +### Tier 1: Standard Retry + +- Reads failure details from `result-task-{id}.md` (Issues + Verification sections) +- Deletes old result file before re-launching +- Launches new background agent with failure context in the prompt +- Updates `progress.md`: `Retrying (1/{max}) [Standard]` + +### Tier 2: Context Enrichment + +Everything from Tier 1, plus: +- Reads full `execution_context.md` (latest merged version, not just snapshot) +- Collects up to 5 related task result files (tasks sharing dependencies or from same wave) +- Injects enrichment block: + +``` +CONTEXT ENRICHMENT (Retry #2): +The following additional context is provided because the standard retry failed. + +Full execution context: +--- +{full execution_context.md content} +--- + +Related task results: +--- +{related result-task-{id}.md files} +--- +``` + +### Tier 3: User Escalation + +Pauses autonomous execution and presents failure details via `AskUserQuestion`: + +| Option | Behavior | +|--------|----------| +| **Fix manually and continue** | User fixes externally, confirms done → marked PASS (manual) | +| **Skip this task** | Logged as FAIL (skipped), execution continues | +| **Provide guidance** | User enters text → guided retry with `USER GUIDANCE (Retry #3)` block | +| **Abort session** | Remaining tasks logged as FAIL (aborted), jumps to session summary | + +**Guided retry loop:** If a guided retry also fails, the user is re-prompted with updated failure details. The loop continues until the user selects an option other than "Provide guidance" or a guided retry succeeds. + +### Batching Rules + +- Tier 1 and Tier 2 retries for all failed tasks in a wave are **batched** (launched in parallel, detected together) +- Tier 3 is **sequential per task** since it requires user interaction +- Each task has an **independent escalation path** — one task at Tier 2 doesn't affect another at Tier 1 +- Escalation level tracked in `task_log.md` as `(T{level})` suffix in Attempts column + +### Retry Execution and Detection (Step 7e.5) + +Retry agents use the same watch → poll fallback pattern as primary wave execution. After detection, retry agents are reaped via `TaskOutput` for usage extraction. Timeout budget resets per wave/retry batch. + +--- + +## 9. Context Merge Protocol + +### Purpose + +Enable cross-task learning within an execution session. Earlier tasks' discoveries (project conventions, file patterns, key decisions) inform later tasks. + +### 6-Section Structured Schema + +Both `execution_context.md` and per-task `context-task-{id}.md` files share the same schema: + +| # | Section | Purpose | Example Entry | +|---|---------|---------|---------------| +| 1 | `## Project Setup` | Package manager, runtime, frameworks, build tools | `- Runtime: Node.js 22 with pnpm` | +| 2 | `## File Patterns` | Test patterns, component patterns, API routes | `- Tests: __tests__/{name}.test.ts alongside source` | +| 3 | `## Conventions` | Import style, error handling, naming | `- Imports: Named exports, barrel files for public API` | +| 4 | `## Key Decisions` | Architecture choices with task attribution | `- [Task #5] Used Zod for runtime validation` | +| 5 | `## Known Issues` | Problems, workarounds, gotchas | `- Vitest mock.calls resets between tests` | +| 6 | `## Task History` | Compact outcome log per task | `- [12] Create API handler — PASS: added /api/users` | + +### Write Isolation + +Agents write to `context-task-{id}.md` (per-task file), never to `execution_context.md` directly. This eliminates write contention between concurrent agents. The orchestrator merges per-task files after each wave. + +### Merge Procedure (Step 7f) + +1. **Read current** `execution_context.md` +2. **Parse into sections** — Split on `## ` markers into `{header → entries}` map +3. **Read per-task files** — All `context-task-{id}.md` files in task ID order, parsed into sections +4. **Merge by section** — For each per-task section, append entries under matching header in `execution_context.md`. Unrecognized section headers → placed under `## Key Decisions` with note +5. **Deduplicate** — Exact text match deduplication within each section (no fuzzy matching) +6. **Write merged file** — Complete `execution_context.md` with all 6 headers in order + +### Malformed Context Handling + +| Condition | Recovery | +|-----------|----------| +| No `## ` headers at all | Place entire content under `## Key Decisions` with warning | +| Some recognized, some not | Recognized merged normally; orphan content → `## Key Decisions` | +| Agent crashes before writing | Orchestrator writes stub from `TaskOutput` if `LEARNINGS:` section found | + +### Within-Session Compaction + +After merge, if any section reaches 10+ entries: + +| Section | Rule | +|---------|------| +| Sections 1-5 | Keep 5 most recent, summarize older entries into paragraph | +| Task History | Keep 10 most recent, summarize older into "Wave Summary" paragraph | + +### Cross-Session Compaction (Step 6) + +When seeding from a prior session's context: + +- Sections 1-5: Merge learnings, compact if 10+ entries +- Task History: Summarize ALL prior entries into a single "Prior Sessions Summary" paragraph + +### Post-Merge Validation + +After compaction, the orchestrator validates the merged file: + +**Validation checks:** + +1. **Header validation** — All 6 required section headers present +2. **Malformed content** — No content lines before first `## ` header (after title) +3. **Size check** — Normal (<500), WARN (500-1000), ERROR (>1000) + +**Validation outcomes:** + +| Status | Condition | Action | +|--------|-----------|--------| +| OK | All headers, no orphans, size normal | No action | +| WARN | Headers OK but size >500 or orphaned lines | Log to `task_log.md` | +| ERROR | Missing headers or size >1000 | Auto-repair missing headers, force compaction | +| REPAIRED | Headers were missing and re-inserted | Log repair to `task_log.md` | + +**Force compaction** (>1000 lines): Aggressive — keep 3 recent entries per section 1-5, keep 5 recent Task History entries. + +**Context Health** written to `progress.md` after each wave (latest wave replaces previous). + +--- + +## 10. Session Management + +### Directory Layout + +``` +.claude/sessions/__live_session__/ # Active execution session +├── execution_plan.md # Wave plan from orchestrator +├── execution_context.md # Shared learnings (6-section schema) +├── task_log.md # Per-task status, duration, tokens +├── progress.md # Real-time progress tracking +├── tasks/ # Archived completed task JSONs +├── context-task-{id}.md # Per-task context (structured, ephemeral) +├── result-task-{id}.md # Per-task result (validated by hook, ephemeral) +├── result-task-{id}.md.invalid # Renamed by validate-result hook if malformed +├── session_summary.md # Final summary (written in Step 8) +└── .lock # Concurrency guard +``` + +### Lock File Protocol + +The `.lock` file enforces the single-session invariant: + +```markdown +task_execution_id: user-auth-20260131-143022 +timestamp: 2026-01-31T14:30:22Z +pid: orchestrator +``` + +| Scenario | Behavior | +|----------|----------| +| No lock exists | Proceed normally | +| Lock < 4 hours old | Prompt user: "Force start" or "Cancel" | +| Lock > 4 hours old | Treat as stale, delete and proceed | +| Session completes | Lock moved to archive with all session files | + +### Interrupted Session Recovery + +When `__live_session__/` contains leftover files from a previous run: + +1. Archive contents to `.claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/` +2. Check for `task_log.md` in archive +3. If found: reset only `in_progress` tasks that appear in the log +4. If not found: reset ALL `in_progress` tasks (conservative) +5. Log each reset and recovery count + +### Session Archival (Step 8) + +After execution completes: + +1. Save `session_summary.md` to `__live_session__/` +2. Create `.claude/sessions/{task_execution_id}/` +3. Move ALL contents from `__live_session__/` to archive (including `.lock`) +4. Leave `__live_session__/` as empty directory +5. `execution_pointer.md` stays pointing to `__live_session__/` (empty until next run) + +### Execution Pointer + +Created at `~/.claude/tasks/{CLAUDE_CODE_TASK_LIST_ID}/execution_pointer.md` with the absolute path to the live session directory. Enables the task-manager dashboard and other tools to locate the active session. + +--- + +## 11. Hook Integration + +### Hook Registration + +Defined in `hooks/hooks.json`: + +```json +{ + "hooks": { + "PreToolUse": [{ + "matcher": "Write|Edit|Bash", + "hooks": [{ + "type": "command", + "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/auto-approve-session.sh", + "timeout": 5 + }] + }], + "PostToolUse": [{ + "matcher": "Write", + "hooks": [{ + "type": "command", + "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/validate-result.sh", + "timeout": 5 + }] + }] + } +} +``` + +### auto-approve-session.sh (PreToolUse) + +**Purpose:** Enable fully autonomous execution by auto-approving file operations targeting session directories. Without this hook, every Write/Edit/Bash to session files would trigger a user permission prompt, breaking the autonomous loop. + +**Approval rules:** + +| Tool | Path Pattern | Approved? | +|------|-------------|-----------| +| Write/Edit | `$HOME/.claude/tasks/*/execution_pointer.md` | Yes | +| Write/Edit | `*/.claude/sessions/*` or `.claude/sessions/*` | Yes | +| Bash | Command containing `.claude/sessions/` | Yes | +| Any | Everything else | No opinion (normal flow) | + +**Safety guarantees:** +- Never exits non-zero (would break permission flow) +- `trap 'exit 0' ERR` catches all unexpected errors +- Optional debug logging via `AGENT_ALCHEMY_HOOK_DEBUG=1` +- Outputs JSON permission decision: `{"hookSpecificOutput":{"permissionDecision":"allow",...}}` + +### validate-result.sh (PostToolUse) + +**Purpose:** Validate result files written by task-executor agents. Catches malformed results before the orchestrator reads them, preventing corrupt data from propagating through the merge pipeline. + +**Validation checks:** + +| Check | Rule | On Failure | +|-------|------|------------| +| Status line | First line matches `status: (PASS\|PARTIAL\|FAIL)` | Rename to `.invalid` | +| Required sections | `## Summary`, `## Files Modified`, `## Context Contribution` | Rename to `.invalid` | +| File size | >25 lines triggers warning | Warning only (not rejected) | +| Write ordering | `context-task-{id}.md` should exist before result file | Creates stub context file | + +**Invalidation procedure:** +1. Copy original content + validation errors to `result-task-{id}.md.invalid` +2. Delete the original `result-task-{id}.md` +3. Orchestrator's completion detection never sees the invalid file → falls back to `TaskOutput` + +**Safety guarantees:** +- Same `trap 'exit 0' ERR` pattern as auto-approve hook +- Only triggers on Write operations to `*/.claude/sessions/*/result-task-*.md` +- 5-second timeout prevents hook from blocking execution + +--- + +## 12. Key Diagrams + +### Orchestration Flow + +```mermaid +flowchart TD + classDef step fill:#E3F2FD,stroke:#1565C0,color:#000 + classDef decision fill:#FFF3E0,stroke:#E65100,color:#000 + classDef action fill:#E8F5E9,stroke:#2E7D32,color:#000 + classDef error fill:#FFEBEE,stroke:#C62828,color:#000 + + S1[Step 1: Load & Filter Tasks]:::step + S2{Step 2: Valid State?}:::decision + S3[Step 3: Build Execution Plan]:::step + S4[Step 4: Check Settings]:::step + S5{Step 5: User Confirms?}:::decision + S55[Step 5.5: Init Session Dir]:::step + S6[Step 6: Init Context]:::step + S7[Step 7: Execute Loop]:::action + S8[Step 8: Session Summary]:::step + S9[Step 9: Update CLAUDE.md]:::step + EXIT1[Exit: No tasks / blocked]:::error + EXIT2[Exit: Cancelled]:::error + + S1 --> S2 + S2 -->|Valid| S3 + S2 -->|Invalid| EXIT1 + S3 --> S4 + S4 --> S5 + S5 -->|Confirm| S55 + S5 -->|Cancel| EXIT2 + S55 --> S6 + S6 --> S7 + S7 --> S8 + S8 --> S9 +``` + +### Wave Execution Detail + +```mermaid +flowchart TD + classDef wave fill:#E3F2FD,stroke:#1565C0,color:#000 + classDef detect fill:#F3E5F5,stroke:#6A1B9A,color:#000 + classDef process fill:#E8F5E9,stroke:#2E7D32,color:#000 + classDef retry fill:#FFF3E0,stroke:#E65100,color:#000 + classDef merge fill:#E0F7FA,stroke:#00695C,color:#000 + + INIT[7a: Identify Unblocked Tasks]:::wave + CONFLICT[7a.5: File Conflict Detection]:::wave + SNAPSHOT[7b: Snapshot Context]:::wave + LAUNCH[7c: Launch Wave Agents]:::wave + DETECT[7c.7: Completion Detection]:::detect + WATCH{fswatch/inotify?}:::detect + WATCHRUN[watch-for-results.sh]:::detect + POLLRUN[poll-for-results.sh]:::detect + BATCH[7d: Batch Result Processing]:::process + SUMMARY[7d-post: Wave Summary]:::process + RETRY{7e: Failed Tasks?}:::retry + RETRYLOOP[7e: Retry Escalation]:::retry + MERGE[7f: Context Merge + Cleanup]:::merge + NEXT{7g: More Unblocked Tasks?}:::wave + DONE[Exit Loop → Step 8]:::wave + + INIT --> CONFLICT + CONFLICT --> SNAPSHOT + SNAPSHOT --> LAUNCH + LAUNCH --> DETECT + DETECT --> WATCH + WATCH -->|Available| WATCHRUN + WATCH -->|Exit 2| POLLRUN + WATCHRUN --> BATCH + POLLRUN --> BATCH + BATCH --> SUMMARY + SUMMARY --> RETRY + RETRY -->|Yes| RETRYLOOP + RETRY -->|No| MERGE + RETRYLOOP --> MERGE + MERGE --> NEXT + NEXT -->|Yes| INIT + NEXT -->|No| DONE +``` + +### Retry Escalation State Machine + +```mermaid +stateDiagram-v2 + classDef tier1 fill:#E3F2FD,color:#000 + classDef tier2 fill:#FFF3E0,color:#000 + classDef tier3 fill:#FFEBEE,color:#000 + classDef success fill:#E8F5E9,color:#000 + classDef terminal fill:#F5F5F5,color:#000 + + [*] --> TaskFails + + TaskFails --> Tier1_Standard : escalation = 1 + Tier1_Standard --> PASS : Success + Tier1_Standard --> Tier2_Enrichment : Fails again + + Tier2_Enrichment --> PASS : Success + Tier2_Enrichment --> Tier3_UserEscalation : Fails again + + Tier3_UserEscalation --> FixManually : User selects + Tier3_UserEscalation --> SkipTask : User selects + Tier3_UserEscalation --> ProvideGuidance : User selects + Tier3_UserEscalation --> AbortSession : User selects + + FixManually --> PASS_Manual : User confirms fix + FixManually --> AbortSession : User cancels + + ProvideGuidance --> GuidedRetry + GuidedRetry --> PASS : Success + GuidedRetry --> Tier3_UserEscalation : Fails (re-prompt) + + PASS --> [*] + PASS_Manual --> [*] + SkipTask --> [*] + AbortSession --> [*] + + class Tier1_Standard tier1 + class Tier2_Enrichment tier2 + class Tier3_UserEscalation tier3 + class PASS,PASS_Manual success + class SkipTask,AbortSession terminal +``` + +### Result File Protocol Flow + +```mermaid +sequenceDiagram + participant O as Orchestrator + participant A as Task Agent + participant H as validate-result.sh + participant FS as File System + + O->>A: Launch (background, bypassPermissions) + Note over A: Phase 1-3: Understand, Implement, Verify + + A->>FS: Write context-task-{id}.md + A->>FS: Write result-task-{id}.md + FS->>H: PostToolUse trigger + + alt Valid result + H->>H: Check status line + sections + Note over H: ✓ Passes validation + else Invalid result + H->>FS: Rename to .invalid + Note over H: ✗ Fails validation + end + + alt Missing context file + H->>FS: Create stub context-task-{id}.md + end + + Note over FS: fswatch/inotify detects file creation + FS-->>O: RESULT_FOUND: result-task-{id}.md + + O->>FS: Read result-task-{id}.md + O->>O: Parse status, verification, files modified + O->>A: TaskOutput (reap process, extract usage) +``` + +### Context Merge Flow + +```mermaid +flowchart LR + classDef task fill:#E3F2FD,stroke:#1565C0,color:#000 + classDef merge fill:#FFF3E0,stroke:#E65100,color:#000 + classDef output fill:#E8F5E9,stroke:#2E7D32,color:#000 + classDef validate fill:#F3E5F5,stroke:#6A1B9A,color:#000 + + CT1[context-task-1.md]:::task + CT2[context-task-2.md]:::task + CT3[context-task-3.md]:::task + EC[execution_context.md\nsnapshot]:::merge + MERGE[Section-Based\nMerge]:::merge + DEDUP[Deduplicate\nEntries]:::merge + COMPACT{Lines > 10\nper section?}:::merge + COMPACTION[Within-Session\nCompaction]:::merge + VALIDATE[Post-Merge\nValidation]:::validate + REPAIR{Missing\nHeaders?}:::validate + AUTOREPAIR[Auto-Repair\nHeaders]:::validate + FORCE{Size >\n1000 lines?}:::validate + FORCECOMPACT[Force\nCompaction]:::validate + RESULT[Updated\nexecution_context.md]:::output + + CT1 --> MERGE + CT2 --> MERGE + CT3 --> MERGE + EC --> MERGE + MERGE --> DEDUP + DEDUP --> COMPACT + COMPACT -->|Yes| COMPACTION + COMPACT -->|No| VALIDATE + COMPACTION --> VALIDATE + VALIDATE --> REPAIR + REPAIR -->|Yes| AUTOREPAIR + REPAIR -->|No| FORCE + AUTOREPAIR --> FORCE + FORCE -->|Yes| FORCECOMPACT + FORCE -->|No| RESULT + FORCECOMPACT --> RESULT +``` + +--- + +## Appendix: Quick Reference + +### Key Environment Variables + +| Variable | Default | Script | Purpose | +|----------|---------|--------|---------| +| `WATCH_TIMEOUT` | 2700 (45 min) | watch-for-results.sh | Max wait time | +| `POLL_START_INTERVAL` | 5 | poll-for-results.sh | Starting poll interval (seconds) | +| `POLL_MAX_INTERVAL` | 30 | poll-for-results.sh | Maximum poll interval (seconds) | +| `POLL_TIMEOUT` | 2700 (45 min) | poll-for-results.sh | Cumulative timeout | +| `AGENT_ALCHEMY_HOOK_DEBUG` | 0 | Both hooks | Enable debug logging | +| `AGENT_ALCHEMY_HOOK_LOG` | `/tmp/agent-alchemy-hook.log` | Both hooks | Debug log path | + +### Critical Invariants + +1. **Result file is always last** — Agents MUST write `context-task-{id}.md` before `result-task-{id}.md`. The result file's existence is the completion signal. +2. **Write, never Edit** — All orchestrator writes to session artifacts use Write (full replacement) via read-modify-write pattern. Edit's string matching is unreliable on growing files. +3. **Single session** — `.lock` file prevents concurrent execution sessions per project. +4. **Context snapshot** — All agents in a wave read the same `execution_context.md` snapshot. No partial merges visible to sibling tasks. +5. **Hooks never fail** — Both hooks use `trap 'exit 0' ERR`. A non-zero exit would break the autonomous execution flow. + +### Common Modification Points + +| Want to change... | Modify... | +|--------------------|-----------| +| Max parallel default | `orchestration.md` Step 3a (default: 5) | +| Retry count default | `SKILL.md` arguments (default: 3) | +| Context section schema | `orchestration.md` "Structured Context Schema" + `task-executor.md` "Write Context File" | +| Result file format | `orchestration.md` "Result File Protocol" + `validate-result.sh` validation rules | +| Completion detection behavior | `watch-for-results.sh` / `poll-for-results.sh` | +| Auto-approval scope | `auto-approve-session.sh` case patterns | +| Wave sorting rules | `orchestration.md` Step 3d | +| File conflict patterns | `orchestration.md` Step 7a.5 "Extract file references" | +| Compaction thresholds | `orchestration.md` Step 7f "Within-Session Compaction" | +| Session ID format | `orchestration.md` Step 5.5 multi-tier resolution | diff --git a/internal/reports/execute-tasks-hardening-2026-02-22.md b/internal/reports/execute-tasks-hardening-2026-02-22.md new file mode 100644 index 0000000..b0a70a2 --- /dev/null +++ b/internal/reports/execute-tasks-hardening-2026-02-22.md @@ -0,0 +1,146 @@ +# Codebase Changes Report + +## Metadata + +| Field | Value | +|-------|-------| +| **Date** | 2026-02-22 | +| **Time** | 19:26 EST | +| **Branch** | execute-tasks-hardening | +| **Author** | Stephen Sequenzia | +| **Base Commit** | `5fcb4b0` (remove \_\_live\_session\_\_) | +| **Latest Commit** | `3862fca` (feat(execute-tasks): implement 10-feature hardening specification) | +| **Repository** | git@github.com:sequenzia/agent-alchemy.git | + +**Scope**: Execute-tasks orchestration hardening — 10-feature specification implementation + +**Summary**: Implemented a comprehensive hardening specification for the execute-tasks orchestration system, adding structured context management, event-driven completion detection, file conflict prevention, producer-consumer task injection, 3-tier retry escalation, progress streaming, merge validation, and a 44-test shell script test suite. All 16 tasks executed autonomously with a 100% pass rate. + +## Overview + +This change implements the full execute-tasks hardening specification across the sdd-tools plugin group. The work was executed autonomously by the `/execute-tasks` skill, completing 16 tasks in 6 waves over 75 minutes with zero retries needed. + +- **Files affected**: 39 +- **Lines added**: +2,879 +- **Lines removed**: -291 +- **Commits**: 1 (single consolidated commit) + +## Files Changed + +| File | Status | Lines | Description | +|------|--------|-------|-------------| +| `CLAUDE.md` | Modified | +26 / -14 | Updated SDD Pipeline Patterns, session layout, critical files table with hardening features | +| `claude/sdd-tools/agents/task-executor.md` | Modified | +175 / -85 | Embedded full 4-phase execution workflow in agent system prompt | +| `claude/sdd-tools/hooks/hooks.json` | Modified | +12 / -2 | Added validate-result.sh PostToolUse hook entry | +| `claude/sdd-tools/hooks/tests/validate-result.bats` | Added | +399 | Bats test suite for result validation hook (19 tests) | +| `claude/sdd-tools/hooks/validate-result.sh` | Added | +100 | PostToolUse hook for result-task-\*.md file validation | +| `claude/sdd-tools/skills/create-tasks/SKILL.md` | Modified | +85 / -17 | Added Phase 6 for produces\_for relationship detection (now 9 phases) | +| `claude/sdd-tools/skills/execute-tasks/SKILL.md` | Modified | +14 / -5 | Updated key behaviors with hardening feature references | +| `claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md` | Modified | +127 / -96 | Transitioned to documentation-only; updated for structured context schema | +| `claude/sdd-tools/skills/execute-tasks/references/orchestration.md` | Modified | +660 / -77 | Major expansion: 10 hardening features across all orchestration steps | +| `claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh` | Modified | +119 / -26 | Rewritten with adaptive intervals and unified output format | +| `claude/sdd-tools/skills/execute-tasks/scripts/tests/poll-for-results.bats` | Added | +238 | Bats test suite for adaptive polling (14 tests) | +| `claude/sdd-tools/skills/execute-tasks/scripts/tests/watch-for-results.bats` | Added | +155 | Bats test suite for event-driven watcher (11 tests) | +| `claude/sdd-tools/skills/execute-tasks/scripts/watch-for-results.sh` | Added | +115 | Event-driven result file detection using fswatch/inotifywait | +| `claude/sdd-tools/tests/fixtures/valid-result-pass.md` | Added | +15 | Shared test fixture: valid PASS result file | +| `claude/sdd-tools/tests/fixtures/valid-result-fail.md` | Added | +15 | Shared test fixture: valid FAIL result file | +| `claude/sdd-tools/tests/fixtures/invalid-result-no-status.md` | Added | +10 | Shared test fixture: result missing status line | +| `claude/sdd-tools/tests/fixtures/invalid-result-no-summary.md` | Added | +8 | Shared test fixture: result missing Summary section | +| `claude/sdd-tools/tests/fixtures/invalid-result-unknown-status.md` | Added | +11 | Shared test fixture: result with invalid status value | +| `.claude/sessions/exec-session-20260222-180300/` | Added | +547 | Archived execution session (plan, context, log, progress, summary, 16 task JSONs) | + +## Change Details + +### Added + +- **`claude/sdd-tools/hooks/validate-result.sh`** — PostToolUse hook that validates result-task-\*.md files on Write operations. Checks for required sections (status line, Summary, Files Modified, Context Contribution), renames invalid files to `.invalid` with error details appended, and creates stub context files if missing. Uses `trap ERR` for guaranteed non-zero-exit prevention. + +- **`claude/sdd-tools/skills/execute-tasks/scripts/watch-for-results.sh`** — Event-driven completion detection using fswatch (macOS) or inotifywait (Linux). FIFO-based architecture for stable state tracking with configurable timeout via `WATCH_TIMEOUT` env var. Exit codes: 0 (all found), 1 (timeout), 2 (tools unavailable — signals fallback to polling). + +- **`claude/sdd-tools/hooks/tests/validate-result.bats`** — 19 bats tests covering: valid PASS/FAIL results, missing status line, unknown status values, missing Summary/Files Modified/Context Contribution sections, non-result file passthrough, missing context file creation, and `.invalid` file generation. + +- **`claude/sdd-tools/skills/execute-tasks/scripts/tests/poll-for-results.bats`** — 14 bats tests covering: all results pre-existing, incremental discovery, ALL\_DONE signal, timeout behavior, adaptive interval progression, single task polling, empty task list, and configurable start interval. + +- **`claude/sdd-tools/skills/execute-tasks/scripts/tests/watch-for-results.bats`** — 11 bats tests covering: pre-existing files, single task watching, timeout behavior, fswatch unavailability fallback (exit code 2), empty task list, and FIFO cleanup. + +- **`claude/sdd-tools/tests/fixtures/`** — 5 shared markdown fixture files used by bats tests for validating result file parsing (valid-result-pass.md, valid-result-fail.md, invalid-result-no-status.md, invalid-result-no-summary.md, invalid-result-unknown-status.md). + +- **`.claude/sessions/exec-session-20260222-180300/`** — Complete archived execution session containing execution plan, shared context, task log, progress tracker, session summary, and 16 archived task JSON files. + +### Modified + +- **`claude/sdd-tools/skills/execute-tasks/references/orchestration.md`** — Major expansion from ~611 to ~1223 lines. Added 10 hardening features: structured context schema (6-section format with compaction rules), event-driven completion detection (Step 7c rewrite), pre-wave file conflict detection (Step 7a.5), produces\_for prompt injection (upstream task output propagation), 3-tier retry escalation (Standard → Context Enrichment → User Escalation), progress streaming (session/wave/completion summaries), post-wave merge validation (OK/WARN/ERROR with auto-repair), result file protocol updates, and agent prompt template updates. + +- **`claude/sdd-tools/agents/task-executor.md`** — Expanded from 324 to 414 lines. Embedded the full 4-phase execution workflow (Understand, Implement, Verify, Complete) directly in the agent's system prompt, including result file format specification, structured context reading/writing instructions, and verification rules. The agent no longer relies on reading external reference files at runtime. + +- **`claude/sdd-tools/skills/execute-tasks/references/execution-workflow.md`** — Transitioned from runtime reference to documentation-only (added blockquote header). Updated Phase 1 context reading for 6-section schema, updated Phase 4 context writing for structured sections. Now serves as the canonical documentation for the 4-phase workflow while the agent carries its own embedded copy. + +- **`claude/sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh`** — Rewritten from 61 to 133 lines. Replaced fixed-interval polling with adaptive intervals (start at 5s, increment by 5s, cap at 30s). Unified output format with `RESULT_FOUND` progress lines and `ALL_DONE` completion signal, matching watch-for-results.sh interface. Added configurable `POLL_START_INTERVAL` env var. + +- **`claude/sdd-tools/skills/create-tasks/SKILL.md`** — Expanded from ~653 to ~738 lines. Added Phase 6 (Detect Producer-Consumer Relationships) that scans spec dependency sections to generate `produces_for` metadata on tasks. Renumbered subsequent phases (6→7, 7→8, 8→9). Updated Task Structure documentation with optional `produces_for` field schema. + +- **`claude/sdd-tools/skills/execute-tasks/SKILL.md`** — Updated key behaviors section with references to new hardening features: event-driven completion, result validation hook, file conflict detection, produces\_for injection, retry escalation, and progress streaming. + +- **`claude/sdd-tools/hooks/hooks.json`** — Added entry for validate-result.sh as a PostToolUse hook on Write operations with 30-second timeout. + +- **`CLAUDE.md`** — Updated SDD Pipeline Patterns with 8 new feature descriptions. Updated session directory layout with `.invalid` file documentation. Updated Critical Plugin Files table with new line counts. Added structured context schema documentation and hardening feature cross-references. + +## Git Status + +### Staged Changes + +No staged changes. + +### Unstaged Changes + +No unstaged changes. + +## Session Commits + +| Hash | Message | Author | Date | +|------|---------|--------|------| +| `3862fca` | feat(execute-tasks): implement 10-feature hardening specification | Stephen Sequenzia | 2026-02-22 | + +### Execution Session Details + +The hardening specification was implemented via autonomous task execution: + +| Metric | Value | +|--------|-------| +| **Tasks executed** | 16 | +| **Pass rate** | 100% (16/16) | +| **Retries** | 0 | +| **Waves** | 6 | +| **Max parallel** | 5 | +| **Total execution time** | 75m 50s | +| **Total token usage** | 1,276,713 | + +#### Wave Breakdown + +| Wave | Tasks | IDs | Duration Range | +|------|-------|-----|----------------| +| Wave 1 | 4 | #155, #156, #159, #160 | 2m 16s – 10m 31s | +| Wave 2 | 3 | #157, #158, #161 | 2m 14s – 4m 47s | +| Wave 3a | 5 | #163, #164, #166, #167, #168 | 2m 28s – 6m 42s | +| Wave 3b+4 | 2 | #162, #165 | 2m 29s – 9m 50s | +| Wave 5 | 1 | #169 | 5m 29s | +| Wave 6 | 1 | #170 | 1m 51s | + +#### Features Implemented + +1. **Structured context schema** — 6-section schema for execution\_context.md with compaction rules +2. **Embedded agent rules** — task-executor.md carries full workflow (414 lines) +3. **Event-driven completion** — watch-for-results.sh (fswatch) with polling fallback +4. **Result validation hook** — validate-result.sh PostToolUse hook with .invalid rename +5. **File conflict detection** — Pre-wave scan to prevent concurrent file edits +6. **produces\_for prompt injection** — Upstream task output injected into dependent prompts +7. **Retry escalation** — 3-tier: Standard → Context Enrichment → User Escalation +8. **Progress streaming** — Session/wave/completion output summaries +9. **Post-wave merge validation** — OK/WARN/ERROR with auto-repair and force compaction +10. **Bats test suite** — 44 tests across 3 scripts (19 + 14 + 11) + +#### Known Issues from Execution + +- Result file format in orchestration.md (Result File Protocol + 7c prompt template) doesn't match task-executor.md embedded format. Non-blocking: validate-result.sh enforces correct format. +- SKILL.md and orchestration.md step numbering diverge at Step 5/5.5. +- Concurrent edits to orchestration.md caused Edit conflicts in Wave 3a (5 agents editing same file simultaneously). diff --git a/internal/reports/phase-aware-tasks-2026-02-22.md b/internal/reports/phase-aware-tasks-2026-02-22.md new file mode 100644 index 0000000..82ae2c9 --- /dev/null +++ b/internal/reports/phase-aware-tasks-2026-02-22.md @@ -0,0 +1,122 @@ +# Codebase Changes Report + +## Metadata + +| Field | Value | +|-------|-------| +| **Date** | 2026-02-22 | +| **Time** | 20:52 EST | +| **Branch** | execute-tasks-hardening | +| **Author** | Stephen Sequenzia | +| **Base Commit** | `997a8b3` chore(marketplace): bump sdd-tools to 0.3.0 | +| **Latest Commit** | `cdff7ce` feat(sdd-tools): implement phase-aware task generation and execution | +| **Repository** | git@github.com:sequenzia/agent-alchemy.git | + +**Scope**: Phase-aware task generation and execution for sdd-tools + +**Summary**: Added `--phase` argument support to both `create-tasks` and `execute-tasks` skills, enabling incremental phase-by-phase task generation from spec Section 9 (Implementation Plan) and phase-filtered execution. Bumped sdd-tools from 0.3.0 to 0.3.1. + +## Overview + +This session implemented the phase-aware task generation and execution plan, adding the ability to generate tasks for specific implementation phases from a spec's Section 9, and to execute only tasks belonging to specific phases. All changes are backward compatible — specs without implementation phases continue to work unchanged. + +- **Files affected**: 10 +- **Lines added**: +256 +- **Lines removed**: -43 +- **Commits**: 2 + +## Files Changed + +| File | Status | Lines | Description | +|------|--------|-------|-------------| +| `claude/sdd-tools/skills/create-tasks/SKILL.md` | Modified | +202 / -29 | Major: phase extraction, selection, hybrid decomposition, metadata | +| `claude/sdd-tools/skills/create-tasks/references/dependency-inference.md` | Modified | +11 / -2 | Updated Section 9 mapping with 3 cross-phase scenarios | +| `claude/sdd-tools/skills/execute-tasks/SKILL.md` | Modified | +22 / -3 | Added `--phase` argument, filtering, examples | +| `claude/sdd-tools/skills/execute-tasks/references/orchestration.md` | Modified | +15 / -4 | Phase filtering in Step 1, multi-tier session ID | +| `claude/.claude-plugin/marketplace.json` | Modified | +1 / -1 | Version bump sdd-tools 0.3.0 → 0.3.1 | +| `CLAUDE.md` | Modified | +1 / -1 | Plugin Inventory table version update | +| `CHANGELOG.md` | Modified | +1 / -0 | Added bump entry under [Unreleased] | +| `docs/index.md` | Modified | +1 / -1 | Project Status table version update | +| `docs/plugins/index.md` | Modified | +1 / -1 | At a Glance table version update | +| `docs/plugins/sdd-tools.md` | Modified | +1 / -1 | Bold metadata line version update | + +## Change Details + +### Modified + +- **`claude/sdd-tools/skills/create-tasks/SKILL.md`** — Expanded from 9 phases to 10 phases. Added `--phase` CLI argument to frontmatter. Phase 1: argument parsing for `--phase`. Phase 2: extract `spec_phase` metadata from existing tasks for phase-aware merge detection. Phase 3: new "Phase Extraction" sub-step that parses Section 9 Implementation Plan headers (`### 9.N Phase N: {Name}`), extracting deliverables, completion criteria, and checkpoint gates, then cross-references deliverables to Section 5 features. NEW Phase 4 "Select Phases": interactive phase selection with 5 paths — CLI argument (Path A), 2-3 phases with multiSelect (Path B), 4+ phases with two-step flow (Path C), no Section 9 (Path D), and merge mode with existing phases (Path E). Phase 5: hybrid decomposition mapping features to phases and filling gaps from deliverable tables, assigning `spec_phase`/`spec_phase_name` metadata to every task. Phase 6: 3-scenario cross-phase dependency handling (current generation, merge mode, missing predecessors). Phase 8: phase-annotated preview with PHASES and PREREQUISITES sections. Phase 9: `spec_phase`/`spec_phase_name` added to TaskCreate example. Phase 10: phase-related error handling for invalid phases, missing Section 9, and unparseable formats. Added phase-specific examples. + +- **`claude/sdd-tools/skills/create-tasks/references/dependency-inference.md`** — Expanded the "Section 9 (Implementation Plan) Mapping" sub-section from a single simple rule to 3 explicit scenarios: (1) Phase N-1 tasks exist in current generation → standard blockedBy, (2) Phase N-1 tasks exist from prior generation (merge mode) → blockedBy to existing IDs, (3) Phase N-1 not generated and no existing tasks → omit blockedBy, add prerequisites note. + +- **`claude/sdd-tools/skills/execute-tasks/SKILL.md`** — Added `--phase` argument to frontmatter (`argument-hint` and `arguments` list). Updated Step 1 (Load Task List) with phase filtering: AND logic with `--task-group`, exclusion of tasks without `spec_phase` metadata, error messaging with available phases. Updated Step 5 session ID generation from three-tier to multi-tier resolution incorporating phase (e.g., `{task_group}-phase{N}-{YYYYMMDD}-{HHMMSS}`). Added "Phase-based filtering" bullet to Key Behaviors section. Added 3 phase execution examples. + +- **`claude/sdd-tools/skills/execute-tasks/references/orchestration.md`** — Added phase filtering paragraphs to Step 1 after `--task-group` filtering, including AND logic, exclusion of tasks without `metadata.spec_phase`, and error messaging. Updated Step 5.5 session ID generation from 3-tier to 6-tier resolution incorporating phase across all filter combinations. + +- **`claude/.claude-plugin/marketplace.json`** — Bumped sdd-tools version from `0.3.0` to `0.3.1`. + +- **`CLAUDE.md`** — Updated sdd-tools version in Plugin Inventory table from `0.3.0` to `0.3.1`. + +- **`CHANGELOG.md`** — Added `- Bump sdd-tools from 0.3.0 to 0.3.1` entry under `## [Unreleased]` → `### Changed`. + +- **`docs/index.md`** — Updated SDD Tools version in Project Status table from `0.3.0` to `0.3.1`. + +- **`docs/plugins/index.md`** — Updated SDD Tools version in At a Glance table from `0.3.0` to `0.3.1`. + +- **`docs/plugins/sdd-tools.md`** — Updated version in bold metadata line from `0.3.0` to `0.3.1`. + +## Git Status + +### Unstaged Changes + +No unstaged changes. + +### Untracked Files + +No untracked files. + +## Session Commits + +| Hash | Message | Author | Date | +|------|---------|--------|------| +| `cdff7ce` | feat(sdd-tools): implement phase-aware task generation and execution | Stephen Sequenzia | 2026-02-22 | +| `2ce973c` | chore(marketplace): bump sdd-tools to 0.3.1 | Stephen Sequenzia | 2026-02-22 | + +## Architectural Notes + +### New Task Metadata Schema + +Two new fields added to every task when spec has implementation phases: + +| Field | Type | Example | Description | +|-------|------|---------|-------------| +| `spec_phase` | integer | `1` | Phase number from Section 9 | +| `spec_phase_name` | string | `"Foundation"` | Phase name from Section 9 | + +Both fields are omitted entirely when the spec has no phases (backward compatible). Not added to `task_uid` — phase changes between spec revisions don't break merge tracking. + +### Pipeline Flow + +The new phase-aware pipeline enables incremental generation: + +``` +create-tasks --phase 1 → tasks with spec_phase: 1 → execute-tasks --phase 1 +create-tasks --phase 2 → tasks with spec_phase: 2 → execute-tasks --phase 2 +``` + +### Cross-Phase Dependency Scenarios + +Three scenarios handle the case where phases are generated incrementally: + +1. **Both phases in same generation** — Normal `blockedBy` between Phase N and N-1 tasks +2. **Phase N-1 from prior run (merge mode)** — `blockedBy` to existing task IDs via `spec_phase` metadata lookup +3. **Phase N-1 never generated** — No `blockedBy` added; "Prerequisites" note with assumed-complete deliverables + +### Verification Checks Performed + +- All 10 phases numbered sequentially (1-10) with correct cross-references +- Metadata fields consistent (`spec_phase`, `spec_phase_name`) across all 4 files +- All AskUserQuestion blocks have 2-4 options (within limit) +- Three dependency scenarios aligned between SKILL.md and dependency-inference.md +- Phase filtering logic matches between execute-tasks SKILL.md and orchestration.md +- Session ID generation terminology aligned ("multi-tier") +- Error message wording standardized between SKILL.md and orchestration.md diff --git a/internal/specs/sdd-execute-tasks-rewrite-SPEC.md b/internal/specs/sdd-execute-tasks-rewrite-SPEC.md new file mode 100644 index 0000000..b83f23c --- /dev/null +++ b/internal/specs/sdd-execute-tasks-rewrite-SPEC.md @@ -0,0 +1,1378 @@ +# SDD Execute Tasks Rewrite PRD + +**Version**: 1.0 +**Author**: Stephen Sequenzia +**Date**: 2026-02-23 +**Status**: Draft +**Spec Type**: New Product +**Spec Depth**: Full Technical Documentation +**Description**: Full rewrite of the SDD orchestration execution engine. The current engine (~2,600 lines across 10+ files) is too complex and unstable. This rewrite replaces the file-based signaling architecture with Claude Code's native Agent Team system, using message-passing coordination instead of filesystem watching. + +--- + +## 1. Executive Summary + +The SDD execution engine (`/execute-tasks`) is the runtime core of the Spec-Driven Development pipeline, responsible for taking a set of tasks with dependency relationships and executing them autonomously via parallel agent teams. The current engine relies on shell scripts (`fswatch`/`inotifywait`) for completion detection and file-based protocols for inter-agent communication, which has proven unreliable and overly complex. This rewrite replaces the entire coordination model with Claude Code's native Agent Team system (`TeamCreate`/`SendMessage`), introducing a 3-tier agent hierarchy (Orchestrator → Wave Lead → Context Manager + Task Executors) that is simpler, more stable, and more resilient. + +## 2. Problem Statement + +### 2.1 The Problem + +The current SDD orchestration engine is too complex and unstable for production use. Its core architecture treats agents as "fire and forget" background processes, then uses filesystem events to detect when they complete. This file-based signaling model requires two shell scripts (115 and 133 lines), custom hooks for file validation, a complex merge pipeline for context sharing, and extensive error handling for partial file writes, missing files, and race conditions. + +### 2.2 Current State + +The current engine (`execute-tasks` skill, version 0.3.1) operates as follows: + +- **Orchestration**: A single 10-step loop coordinates all execution from the orchestrator skill's prompt +- **Agent launching**: Task executors are spawned as background `Task` agents with `run_in_background: true` +- **Completion detection**: `watch-for-results.sh` (fswatch/inotifywait) with automatic fallback to `poll-for-results.sh` (adaptive 5s-30s polling) +- **Result protocol**: Each agent writes a `result-task-{id}.md` file (~18 lines); a PostToolUse hook (`validate-result.sh`) validates format on write +- **Context sharing**: Per-task `context-task-{id}.md` files are merged into shared `execution_context.md` by the orchestrator after each wave, using a 6-section structured schema with compaction and deduplication +- **Retry**: 3-tier escalation (Standard → Context Enrichment → User Escalation) with batched retry processing +- **Concurrency**: `.lock` file prevents concurrent sessions; file conflict detection defers tasks modifying the same files + +**Key files**: + +| File | Lines | Role | +|------|-------|------| +| `SKILL.md` | 293 | Skill entry point | +| `references/orchestration.md` | ~1,235 | 10-step orchestration loop | +| `agents/task-executor.md` | 414 | Opus-tier task agent | +| `scripts/watch-for-results.sh` | 115 | Event-driven completion detection | +| `scripts/poll-for-results.sh` | 133 | Polling fallback | +| `hooks/auto-approve-session.sh` | 75 | PreToolUse auto-approval | +| `hooks/validate-result.sh` | 100 | PostToolUse result validation | + +### 2.3 Impact Analysis + +The instability of the execution engine directly blocks the SDD pipeline. When completion detection fails: + +- **Silent hangs**: The orchestrator waits indefinitely for result files that were written but not detected by fswatch +- **Partial wave completion**: Some agents' results are detected, others are missed, causing inconsistent state +- **Cascading timeouts**: The 8-minute Bash timeout for detection scripts triggers recovery paths that add further complexity +- **Context corruption**: Failed merges or partial writes to `execution_context.md` degrade context quality for subsequent waves + +The complexity also creates a maintenance burden: any change to the orchestration logic requires understanding the interaction between the 10-step loop, shell scripts, hook validation, and file-based protocols — a cognitive load that inhibits iteration. + +### 2.4 Business Value + +The execution engine is the terminal artifact in the SDD pipeline (`/create-spec` → spec → `/create-tasks` → tasks → `/execute-tasks` → code). If execution is unreliable, the entire pipeline's value proposition — autonomous code generation from specifications — is undermined. A stable, simpler engine enables confident multi-wave execution of complex specs, which is the primary use case for the SDD tools plugin. + +## 3. Goals & Success Metrics + +### 3.1 Primary Goals + +1. **Replace file-based signaling with message-based coordination** using Claude Code's native Agent Team system (`TeamCreate`, `SendMessage`, `TaskOutput`) +2. **Reduce architectural complexity** by eliminating shell scripts, file-based protocols, and the 6-section merge pipeline +3. **Improve resilience** with automatic wave-lead crash recovery, per-task timeouts, and graceful degradation under API rate limits +4. **Maintain functional parity** for the end user — the `/execute-tasks` command interface, task filtering, and session artifacts remain familiar + +### 3.2 Success Metrics + +| Metric | Current Baseline | Target | Measurement Method | +|--------|------------------|--------|-------------------| +| Orchestration code size | ~2,600 lines (10+ files) | < 1,500 lines | Line count of skill + references + agents | +| Shell script dependencies | 2 scripts (248 lines) | 0 scripts | File inventory | +| Completion detection reliability | Intermittent failures (fswatch misses) | 100% (message-based) | Execute 10 multi-wave sessions without detection failure | +| Wave execution success rate | ~80% (estimated from retry patterns) | > 95% first-attempt pass rate | Session logs across 20 executions | +| New failure recovery modes | 0 (no wave-lead crash handling) | 2 (wave-lead crash retry + per-task timeout) | Feature verification | + +### 3.3 Non-Goals + +- **Changing the task format**: Tasks produced by `/create-tasks` remain compatible — same JSON structure, same `blockedBy` relationships, same metadata fields +- **Changing the spec format**: The input spec format is untouched — this rewrite only affects execution +- **Real-time streaming of per-task progress**: Wave-level progress events are sufficient; per-line code generation streaming is out of scope +- **Multi-session concurrency**: Only one execution session at a time per project (same as current) + +## 4. User Research + +### 4.1 Target Users + +#### Primary Persona: SDD Pipeline User + +- **Role/Description**: Developer using the full SDD pipeline (`/create-spec` → `/create-tasks` → `/execute-tasks`) to generate code from specifications +- **Goals**: Execute a set of tasks autonomously with minimal intervention, verify results, and iterate +- **Pain Points**: Execution hangs on completion detection, unclear error messages when waves fail, excessive session artifacts to debug +- **Context**: Invokes `/execute-tasks` after task generation, monitors progress, intervenes only on escalation +- **Technical Proficiency**: High — understands task dependencies, wave parallelism, and agent coordination + +#### Secondary Persona: Plugin Developer + +- **Role/Description**: Developer maintaining or extending the SDD tools plugin +- **Goals**: Modify orchestration behavior, add new features, debug execution issues +- **Pain Points**: Current architecture requires understanding 10+ files and the interaction between shell scripts, hooks, and file protocols +- **Context**: Reads and modifies skill files, agent definitions, and hook scripts + +### 4.2 User Journey Map + +``` +[Tasks created] --> [/execute-tasks] --> [Review plan] --> [Confirm] --> [Monitor waves] --> [Handle escalations] --> [Review results] + | | | | | | | + v v v v v v v + Task JSON Load & plan Wave breakdown "Proceed?" Progress events Fix/skip/guide/abort Session summary +``` + +### 4.3 User Workflows + +#### Workflow 1: Standard Execution + +```mermaid +flowchart TD + classDef user fill:#E3F2FD,stroke:#1565C0,color:#000 + classDef system fill:#E8F5E9,stroke:#2E7D32,color:#000 + classDef decision fill:#FFF3E0,stroke:#E65100,color:#000 + + START[User: /execute-tasks]:::user + PLAN[Orchestrator: Build plan]:::system + CONFIRM{User: Approve plan?}:::decision + WAVE[Spawn wave team]:::system + LEAD[Wave-lead: Manage executors]:::system + RESULTS[Wave-lead: Report summary]:::system + MORE{More waves?}:::decision + SUMMARY[Session summary]:::system + + START --> PLAN --> CONFIRM + CONFIRM -->|Yes| WAVE + CONFIRM -->|No| END[Cancel]:::user + WAVE --> LEAD --> RESULTS --> MORE + MORE -->|Yes| WAVE + MORE -->|No| SUMMARY +``` + +#### Workflow 2: Failure Escalation + +```mermaid +flowchart TD + classDef agent fill:#E3F2FD,stroke:#1565C0,color:#000 + classDef user fill:#FFF3E0,stroke:#E65100,color:#000 + classDef recover fill:#E8F5E9,stroke:#2E7D32,color:#000 + + FAIL[Executor reports failure]:::agent + RETRY[Wave-lead: Immediate retry]:::agent + CHECK{Retry succeeded?}:::agent + ESCALATE[Wave-lead → Orchestrator: Report failure]:::agent + USER{User decision}:::user + FIX[Fix manually + continue]:::recover + SKIP[Skip task]:::recover + GUIDE[Provide guidance → guided retry]:::recover + ABORT[Abort session]:::user + + FAIL --> RETRY --> CHECK + CHECK -->|Yes| DONE[Continue wave]:::recover + CHECK -->|No| ESCALATE --> USER + USER --> FIX + USER --> SKIP + USER --> GUIDE + USER --> ABORT +``` + +## 5. Functional Requirements + +### 5.1 Feature: 3-Tier Agent Hierarchy + +**Priority**: P0 (Critical) +**Complexity**: High + +#### User Stories + +**US-001**: As an SDD pipeline user, I want the execution engine to use Claude Code's native team coordination so that execution doesn't depend on unreliable filesystem watching. + +**Acceptance Criteria**: +- [ ] Each wave spawns a dedicated Agent Team via `TeamCreate` +- [ ] Wave-lead agent coordinates task executors via `SendMessage` (no file-based signaling) +- [ ] Context Manager agent per wave handles execution context distribution and collection +- [ ] Task executor agents use the 4-phase workflow (Understand, Implement, Verify, Report) +- [ ] All inter-agent communication uses `SendMessage` with structured protocols +- [ ] No shell scripts are required for execution coordination + +**Technical Notes**: +- Agent hierarchy: Orchestrator (skill) → Wave Lead (team lead) → Context Manager + Task Executor × N (team members) +- The orchestrator runs in the user's conversation context; wave teams run as spawned agents +- Each wave team is independent — no cross-wave team membership + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Wave with single task | Wave-lead still spawns context manager + one executor (consistent pattern) | +| Wave with 0 unblocked tasks after filtering | Skip wave, proceed to next (or finish) | +| All tasks in a wave fail | Wave-lead reports all failures; orchestrator presents batch escalation to user | + +**Error Handling**: + +| Error Condition | System Action | +|-----------------|---------------| +| TeamCreate fails | Orchestrator retries once; on second failure, marks wave tasks as failed and offers user the choice to retry or skip | +| SendMessage fails between agents | Agent retries delivery; on persistent failure, wave-lead logs the issue and marks affected task as failed | +| Task tool spawn fails | Wave-lead logs error, marks task as failed, continues with remaining executors | + +--- + +### 5.2 Feature: Wave Lead Agent + +**Priority**: P0 (Critical) +**Complexity**: High + +#### User Stories + +**US-002**: As an SDD pipeline user, I want each wave to be managed by an autonomous wave-lead agent so that wave execution is self-contained and recoverable. + +**Acceptance Criteria**: +- [ ] Wave-lead launches context manager as first team member +- [ ] Wave-lead launches task executor agents for each task in the wave +- [ ] Wave-lead manages pacing autonomously using `max_parallel` as a guideline (not a rigid cap) +- [ ] Wave-lead collects structured results from all executors via `SendMessage` +- [ ] Wave-lead handles immediate retry (1 attempt) for failed executors before escalating +- [ ] Wave-lead reports wave summary to orchestrator via `SendMessage` including: tasks passed, tasks failed, duration, key decisions +- [ ] Wave-lead manages TaskUpdate calls (marks tasks `in_progress`, `completed`, `failed`) +- [ ] Wave-lead model is configurable (default: Opus, override: Sonnet) + +**Technical Notes**: +- Wave-lead receives: task list for this wave, execution context snapshot, wave number, max_parallel hint +- Wave-lead produces: wave summary message to orchestrator, TaskUpdate state changes +- Wave-lead lifecycle: created per wave, destroyed after wave completes (no persistent wave-leads) + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Executor finishes before others | Wave-lead acknowledges result immediately; does not wait for batch | +| All executors fail | Wave-lead reports all failures to orchestrator for user escalation | +| Rate limit hit during agent spawning | Staggered spawning with backoff (see graceful degradation requirement) | +| Wave-lead itself crashes | Orchestrator detects via TaskOutput, resets wave tasks to pending, spawns new wave team | + +--- + +### 5.3 Feature: Context Manager Agent + +**Priority**: P0 (Critical) +**Complexity**: High + +#### User Stories + +**US-003**: As an SDD pipeline user, I want a dedicated context manager per wave so that execution context is intelligently summarized, distributed, and collected without complex file-based merge pipelines. + +**Acceptance Criteria**: +- [ ] Context manager reads main `execution_context.md` at wave start +- [ ] Context manager derives a relevant summary of session context up to the current wave +- [ ] Context manager distributes summary to all task executors via `SendMessage` +- [ ] Task executors send key decisions, insights, and patterns back to context manager during execution +- [ ] Context manager summarizes collected information at wave end +- [ ] Context manager updates main `execution_context.md` with new wave section +- [ ] Context manager model is configurable (default: Sonnet, override: Opus) + +**Technical Notes**: +- `execution_context.md` is organized by waves (not the current 6-section schema) +- Context manager has Read/Write access to the session directory +- Context manager is a team member (not the team lead) — wave-lead coordinates its lifecycle +- Context distribution happens before task executors begin work + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Empty execution_context.md (first wave) | Context manager distributes minimal context: "This is the first wave. No prior context available." | +| Very large execution_context.md (many prior waves) | Context manager summarizes aggressively; includes only relevant patterns, decisions, and conventions | +| Context manager crashes | Wave-lead detects; executors proceed without distributed context; wave-lead writes a minimal context entry for the wave | +| Executor sends context update after context manager has already written | Context manager handles late arrivals if still alive; otherwise updates are lost (acceptable — not critical data) | + +--- + +### 5.4 Feature: Task Executor Agent (Revised) + +**Priority**: P0 (Critical) +**Complexity**: Medium + +#### User Stories + +**US-004**: As an SDD pipeline user, I want task executors to implement code changes using a 4-phase workflow and communicate results via structured messages so that execution quality is maintained without file-based protocols. + +**Acceptance Criteria**: +- [ ] Executors follow 4-phase workflow: Understand, Implement, Verify, Report +- [ ] Executors send structured result message to wave-lead via `SendMessage` +- [ ] Result message includes: status (PASS/PARTIAL/FAIL), summary, files_modified, verification_results, issues, context_contribution +- [ ] Executors send context contribution (decisions, patterns, insights) to context manager via separate `SendMessage` +- [ ] Executors run at Opus model tier +- [ ] Executors operate with `bypassPermissions` mode for implementation autonomy + +**Technical Notes**: +- Executors are team members spawned by the wave-lead +- Each executor receives: task description, acceptance criteria, context summary (from context manager), and any relevant metadata +- The structured result protocol replaces the current `result-task-{id}.md` file format + +**Structured Result Protocol**: +``` +STATUS: PASS | PARTIAL | FAIL +SUMMARY: Brief description of what was accomplished +FILES_MODIFIED: +- path/to/file1.ts (created) +- path/to/file2.ts (modified) +VERIFICATION: +- [PASS] Criterion 1 description +- [PASS] Criterion 2 description +- [FAIL] Criterion 3 description +ISSUES: +- Issue description (if any) +CONTEXT_CONTRIBUTION: +- Key decision or insight worth sharing with other tasks +``` + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Executor exceeds per-task timeout | Wave-lead terminates executor, marks task as failed, triggers retry | +| Executor produces PARTIAL result | Wave-lead treats as failure for retry purposes but preserves partial work | +| Executor modifies unexpected files | Accepted — verification phase should catch unintended changes | + +--- + +### 5.5 Feature: Simplified Orchestration Loop + +**Priority**: P0 (Critical) +**Complexity**: Medium + +#### User Stories + +**US-005**: As an SDD pipeline user, I want a streamlined orchestration loop so that execution is predictable and the codebase is maintainable. + +**Acceptance Criteria**: +- [ ] Orchestration loop has 9 steps (reduced from 10 with simplified internals) +- [ ] Step 1 (Load & Filter): Support `--task-group` and `--phase` filtering +- [ ] Step 2 (Validate): Detect empty tasks, all completed, blocked tasks, circular dependencies +- [ ] Step 3 (Plan): Topological sort, wave assignment, priority ordering within waves +- [ ] Step 4 (Settings): Read configuration from `.claude/agent-alchemy.local.md` +- [ ] Step 5 (Confirm): Present execution plan to user, get approval via `AskUserQuestion` +- [ ] Step 6 (Init Session): Create session directory with `execution_context.md` and `task_log.md` +- [ ] Step 7 (Execute Waves): For each wave, create team → wave-lead manages → collect summary → update context +- [ ] Step 8 (Summarize): Generate session summary, archive session +- [ ] Step 9 (Update CLAUDE.md): Review execution context for project-wide changes + +**Technical Notes**: +- Steps 1-6 run in the orchestrator skill's prompt (user's context) +- Step 7 delegates to wave teams — orchestrator waits for each wave-lead's summary +- Steps 8-9 run in the orchestrator after all waves complete +- The orchestrator passes accumulated `execution_context.md` content to each wave-lead's prompt as cross-wave context bridge + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| User cancels at Step 5 | Clean exit, no tasks modified | +| All tasks already completed | Report summary at Step 2, no execution | +| Circular dependencies detected | Break at weakest link (fewest blockers), warn user in plan | +| `--phase 1,2` filtering | Execute tasks in spec phases 1 and 2 only | + +--- + +### 5.6 Feature: 2-Tier Retry Model + +**Priority**: P1 (High) +**Complexity**: Medium + +#### User Stories + +**US-006**: As an SDD pipeline user, I want a simple retry model so that transient failures are recovered automatically and persistent failures are escalated to me promptly. + +**Acceptance Criteria**: +- [ ] Tier 1 (Autonomous Retry): Wave-lead immediately retries a failed executor (1 attempt by default, configurable via `max_retries`) +- [ ] Retry includes failure context from the original attempt +- [ ] Wave-lead can request additional context from Context Manager to inform the retry +- [ ] Tier 2 (User Escalation): After retry exhaustion, wave-lead reports failure to orchestrator +- [ ] Orchestrator presents failure to user via `AskUserQuestion` with options: Fix manually, Skip, Provide guidance, Abort session +- [ ] "Provide guidance" option triggers a guided retry with user-supplied instructions +- [ ] Guided retry failures re-prompt the user (loop until resolution) + +**Technical Notes**: +- Retry is immediate per executor (not batched) — as soon as an executor reports failure, the wave-lead can retry while other executors are still running +- Escalation flows: Executor → Wave-lead (retry) → Wave-lead (escalate via SendMessage) → Orchestrator (present to user) → Orchestrator (relay decision to wave-lead) → Wave-lead (act on decision) +- The wave-lead continues managing other running executors during the escalation round-trip + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Multiple executors fail simultaneously | Each is retried independently and immediately | +| Retry succeeds | Wave-lead updates task to completed, continues normally | +| User selects "Abort session" | Orchestrator signals wave-lead to terminate remaining executors; all remaining tasks logged as failed | +| User selects "Fix manually" | Orchestrator waits for user confirmation that the fix is done; marks task as completed (manual) | + +--- + +### 5.7 Feature: Wave-Lead Crash Recovery + +**Priority**: P1 (High) +**Complexity**: Medium + +#### User Stories + +**US-007**: As an SDD pipeline user, I want the orchestrator to automatically recover when a wave-lead agent crashes so that a single agent failure doesn't require restarting the entire session. + +**Acceptance Criteria**: +- [ ] Orchestrator monitors wave-lead via `TaskOutput` with appropriate timeout +- [ ] On wave-lead crash or timeout, orchestrator resets wave tasks to pending (using TaskUpdate) +- [ ] Orchestrator spawns a new wave team for the reset tasks +- [ ] Recovery is automatic — no user intervention required unless the retry also fails +- [ ] If second wave-lead also crashes, orchestrator escalates to user + +**Technical Notes**: +- "Crash" includes: agent timeout, unexpected termination, malformed summary response +- Wave tasks that were already completed by executors before the crash retain their completed status +- Only `in_progress` or `pending` tasks within the wave are reset + +--- + +### 5.8 Feature: Per-Task Timeout Management + +**Priority**: P1 (High) +**Complexity**: Medium + +*Agent Recommendation — accepted during interview.* + +#### User Stories + +**US-008**: As an SDD pipeline user, I want per-task timeouts so that stuck executors are proactively terminated rather than blocking the entire wave. + +**Acceptance Criteria**: +- [ ] Wave-lead monitors each executor's duration +- [ ] Default timeout is complexity-based (simple tasks: 5 min, standard tasks: 10 min, complex tasks: 20 min) +- [ ] Timeout triggers proactive termination via `TaskStop` +- [ ] Timed-out tasks are treated as failures and enter the retry flow +- [ ] Timeout values can be overridden per task via task metadata + +**Technical Notes**: +- Complexity classification can use task description length, number of acceptance criteria, or explicit `complexity` metadata field +- The wave-lead tracks start time for each executor and checks against timeout threshold + +--- + +### 5.9 Feature: Graceful Degradation Under Rate Limits + +**Priority**: P1 (High) +**Complexity**: Low + +*Agent Recommendation — accepted during interview.* + +#### User Stories + +**US-009**: As an SDD pipeline user, I want the engine to handle API rate limits gracefully so that spawning many agents doesn't crash the wave. + +**Acceptance Criteria**: +- [ ] Wave-lead implements staggered agent spawning (brief delay between launches) +- [ ] Rate limit errors during agent creation trigger retry with exponential backoff +- [ ] Partial team formation is handled — if some executors fail to spawn, wave-lead proceeds with those that succeeded and retries spawning the rest +- [ ] Spawning failures are logged to the wave summary + +**Technical Notes**: +- Stagger delay should be configurable but default to a small value (e.g., 1-2 seconds between spawns) +- The Claude Code Task tool handles some rate limiting internally, but rapid parallel spawns can still trigger limits + +--- + +### 5.10 Feature: Progress Reporting Hooks + +**Priority**: P2 (Medium) +**Complexity**: Low + +#### User Stories + +**US-010**: As a developer using the task-manager dashboard, I want wave-level progress events so that I can monitor execution status in real-time without reading log files. + +**Acceptance Criteria**: +- [ ] PreToolUse hook emits event when wave team is created (wave started) +- [ ] PostToolUse hook emits event when wave summary is received (wave completed with task statuses) +- [ ] Session start and session complete events are emitted +- [ ] Events are written to a known location that the task-manager dashboard can watch + +**Technical Notes**: +- Event format should be lightweight (JSON lines or similar) +- The task-manager dashboard already watches `~/.claude/tasks/` via Chokidar — progress events could be written to the session directory +- Progress hooks are optional and should not affect execution if they fail + +--- + +### 5.11 Feature: Auto-Approve Hook (Revised) + +**Priority**: P2 (Medium) +**Complexity**: Low + +#### User Stories + +**US-011**: As an SDD pipeline user, I want session directory writes to be auto-approved so that context manager updates to `execution_context.md` don't trigger permission prompts. + +**Acceptance Criteria**: +- [ ] PreToolUse hook auto-approves Write/Edit operations to the session directory (`*/.claude/sessions/*`) +- [ ] Auto-approval covers `execution_context.md`, `task_log.md`, and `session_summary.md` +- [ ] Hook never exits non-zero (safe error handling) +- [ ] Hook has debug logging capability via environment variable + +**Technical Notes**: +- This is a simplified version of the current `auto-approve-session.sh` — same concept, reduced scope +- Consider whether agents running with `bypassPermissions` mode eliminate the need for this hook entirely + +--- + +### 5.12 Feature: Dry-Run Mode + +**Priority**: P2 (Medium) +**Complexity**: Low + +*Agent Recommendation — accepted during interview.* + +#### User Stories + +**US-012**: As an SDD pipeline user, I want a dry-run mode so that I can validate the execution plan and team structure without spawning real agents or modifying any files. + +**Acceptance Criteria**: +- [ ] `--dry-run` flag skips Step 7 (Execute Waves) entirely +- [ ] Dry-run output shows: wave breakdown, task assignments per wave, agent model tiers, estimated team composition per wave +- [ ] No tasks are modified (no TaskUpdate calls) +- [ ] No session directory is created +- [ ] Dry-run completes in seconds (no agent spawning) + +--- + +### 5.13 Feature: Session Management (Simplified) + +**Priority**: P1 (High) +**Complexity**: Low + +#### User Stories + +**US-013**: As an SDD pipeline user, I want simple session management with basic recovery so that interrupted sessions can be resumed without complex cleanup logic. + +**Acceptance Criteria**: +- [ ] Session ID generated from task-group + timestamp (e.g., `auth-feature-20260223-143022`) +- [ ] Session directory: `.claude/sessions/__live_session__/` +- [ ] Session artifacts: `execution_context.md`, `task_log.md`, `session_summary.md` (3 files only) +- [ ] On interrupted session detection: offer user choice to resume or start fresh via `AskUserQuestion` +- [ ] Resume: reset `in_progress` tasks to pending, continue from next unblocked wave +- [ ] Fresh start: archive interrupted session to `.claude/sessions/{session_id}/`, create new session +- [ ] No `.lock` file — detection is based on presence of `__live_session__/` with content + +--- + +### 5.14 Feature: Configuration System + +**Priority**: P2 (Medium) +**Complexity**: Low + +#### User Stories + +**US-014**: As an SDD pipeline user, I want execution behavior to be configurable so that I can tune agent tiers and retry behavior for my project. + +**Acceptance Criteria**: +- [ ] Configuration read from `.claude/agent-alchemy.local.md` YAML frontmatter +- [ ] Configurable settings: + - `execute-tasks.max_parallel` (default: 5) — hint to wave-lead for pacing + - `execute-tasks.max_retries` (default: 1) — autonomous retries before user escalation + - `execute-tasks.wave_lead_model` (default: `opus`) — model for wave-lead agents + - `execute-tasks.context_manager_model` (default: `sonnet`) — model for context manager agents +- [ ] CLI arguments override settings file values +- [ ] Missing settings file is not an error — defaults are used + +--- + +## 6. Non-Functional Requirements + +### 6.1 Performance Requirements + +| Metric | Requirement | Measurement Method | +|--------|-------------|-------------------| +| Wave setup time | < 30 seconds from wave start to all executors launched | Timestamp comparison in wave summary | +| Context distribution time | < 15 seconds from context manager start to all executors receiving context | Wave-lead tracking | +| Orchestrator overhead per wave | < 60 seconds (plan review, team creation, summary processing) | Session log timestamps | +| Total execution overhead | < 10% of total wall time spent on coordination vs. actual implementation | Session summary analysis | + +### 6.2 Reliability Requirements + +| Metric | Requirement | +|--------|-------------| +| Completion detection | 100% — message-based delivery eliminates detection failures | +| Wave-lead crash recovery | Automatic retry within 60 seconds of crash detection | +| Per-task timeout enforcement | Stuck executors terminated within 30 seconds of timeout | +| Session recovery | Resume from any interruption point without data loss | + +### 6.3 Scalability Requirements + +| Metric | Requirement | +|--------|-------------| +| Max tasks per session | 100+ (limited by API rate limits, not architecture) | +| Max tasks per wave | Limited only by `max_parallel` hint and API rate limits | +| Max waves per session | Unlimited (determined by dependency graph depth) | +| Context file growth | Linear with wave count; context manager summarizes to prevent unbounded growth | + +### 6.4 Maintainability Requirements + +| Metric | Requirement | +|--------|-------------| +| Total orchestration code | < 1,500 lines across all files | +| Shell script count | 0 (all coordination via Claude Code primitives) | +| Agent definition count | 3 new agents (wave-lead, context-manager, task-executor) | +| Hook count | ≤ 2 (auto-approve + progress, both optional) | + +## 7. Technical Architecture + +### 7.1 System Overview + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ User's Conversation Context │ +│ ┌─────────────────────────────────────────────────────────────┐ │ +│ │ Orchestrator Skill (/execute-tasks) │ │ +│ │ Steps 1-6: Load, Validate, Plan, Settings, Confirm, Init │ │ +│ │ Step 7: Spawn wave teams (sequential) │ │ +│ │ Steps 8-9: Summarize, Update CLAUDE.md │ │ +│ └─────────────────────────┬───────────────────────────────────┘ │ +└────────────────────────────┼────────────────────────────────────────┘ + │ TeamCreate + SendMessage + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Wave Team (per wave) │ +│ │ +│ ┌───────────────────────────────────────────┐ │ +│ │ Wave Lead Agent (Opus) │ │ +│ │ - Launches context manager + executors │ │ +│ │ - Collects results via SendMessage │ │ +│ │ - Handles immediate retries │ │ +│ │ - Manages TaskUpdate state changes │ │ +│ │ - Reports wave summary to orchestrator │ │ +│ └────────┬──────────────────┬────────────────┘ │ +│ │ │ │ +│ ┌──────▼──────┐ ┌──────▼──────────────────────────┐ │ +│ │ Context │ │ Task Executors (Opus) × N │ │ +│ │ Manager │◄──►│ - 4-phase workflow │ │ +│ │ (Sonnet) │ │ - Structured result protocol │ │ +│ │ │ │ - Context contribution to CM │ │ +│ └──────────────┘ └─────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Session Directory │ +│ .claude/sessions/__live_session__/ │ +│ ├── execution_context.md (cross-wave learning, grouped by wave) │ +│ ├── task_log.md (per-task status, duration, tokens) │ +│ └── session_summary.md (final execution report) │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### 7.2 Tech Stack + +| Layer | Technology | Justification | +|-------|------------|---------------| +| Orchestration | Claude Code Skill (markdown-as-code) | Existing plugin system; runs in user's context | +| Agent coordination | `TeamCreate` / `SendMessage` / `TaskOutput` | Native Claude Code primitives; message-passing replaces file-based signaling | +| Task state | `TaskList` / `TaskUpdate` / `TaskGet` | Native Claude Code task management; replaces custom state tracking | +| Agent spawning | `Task` tool with `team_name` parameter | Team-aware agent spawning | +| Session storage | Local filesystem (`.claude/sessions/`) | Persistent session artifacts for history and debugging | +| Configuration | YAML frontmatter in `.claude/agent-alchemy.local.md` | Existing settings convention | + +### 7.3 Agent Definitions + +#### Agent: Wave Lead (`wave-lead.md`) + +```yaml +--- +model: opus # configurable via settings +tools: + - Task + - TaskList + - TaskGet + - TaskUpdate + - TaskStop + - SendMessage + - Read + - Glob + - Grep +--- +``` + +**Responsibilities**: +1. Receive wave assignment (task list, max_parallel hint, wave number) from orchestrator +2. Launch Context Manager agent as first team member +3. Wait for Context Manager to signal readiness (context distributed to session) +4. Launch Task Executor agents (staggered spawning for rate limit protection) +5. Monitor executor progress via `SendMessage` (collect structured results) +6. Handle immediate retry for failed executors (request additional context from Context Manager if needed) +7. Manage TaskUpdate calls (in_progress, completed, failed) for wave tasks +8. After all executors complete: signal Context Manager to finalize context, collect wave metrics +9. Send structured wave summary to orchestrator via `SendMessage` +10. Handle shutdown request from orchestrator + +#### Agent: Context Manager (`context-manager.md`) + +```yaml +--- +model: sonnet # configurable via settings +tools: + - Read + - Write + - SendMessage + - Glob + - Grep +--- +``` + +**Responsibilities**: +1. Read `execution_context.md` from session directory +2. Derive a concise, relevant summary of all prior wave learnings +3. Distribute context summary to all task executors via `SendMessage` +4. Signal wave-lead that context distribution is complete +5. Receive context contributions from executors during execution (decisions, patterns, insights, issues) +6. On wave completion signal from wave-lead: summarize all collected contributions +7. Append new wave section to `execution_context.md` +8. Handle shutdown request + +#### Agent: Task Executor (`task-executor.md`) + +```yaml +--- +model: opus +tools: + - Read + - Write + - Edit + - Glob + - Grep + - Bash + - SendMessage +--- +``` + +**Responsibilities** (4-phase workflow): +1. **Understand**: Read task description, acceptance criteria, and distributed context. Analyze requirements. +2. **Implement**: Make code changes (Write, Edit, Bash). Follow project conventions from context. +3. **Verify**: Check acceptance criteria. Run tests if applicable. Validate changes. +4. **Report**: Send structured result to wave-lead. Send context contribution to context manager. + +### 7.4 Communication Protocols + +#### Orchestrator → Wave Lead (via Task prompt) + +``` +WAVE ASSIGNMENT +Wave: {N} of {total} +Max Parallel: {max_parallel} +Max Retries: {max_retries} +Session Dir: {session_dir_path} + +TASKS: +- Task #{id}: {subject} + Description: {description} + Acceptance Criteria: {criteria} + Priority: {priority} + Metadata: {metadata} + +CROSS-WAVE CONTEXT: +{Summary of execution_context.md content for context bridge} +``` + +#### Wave Lead → Orchestrator (via SendMessage) + +``` +WAVE SUMMARY +Wave: {N} +Duration: {total_wave_duration} +Tasks Passed: {count} +Tasks Failed: {count} + +RESULTS: +- Task #{id}: {status} ({duration}, {tokens}) + Summary: {brief} + Files: {file_list} +- Task #{id}: {status} ({duration}, {tokens}) + Summary: {brief} + Files: {file_list} + +FAILED TASKS (for escalation): +- Task #{id}: {failure_reason} + Retry Attempted: {yes/no} + Retry Result: {outcome} + +CONTEXT UPDATES: +{Summary of new learnings from this wave — for orchestrator's awareness} +``` + +#### Task Executor → Wave Lead (via SendMessage) + +``` +TASK RESULT +Task: #{id} +Status: PASS | PARTIAL | FAIL +Summary: {what was accomplished} +Files Modified: +- {path} (created|modified|deleted) +Verification: +- [PASS|FAIL] {criterion} +Issues: +- {issue description, if any} +``` + +#### Task Executor → Context Manager (via SendMessage) + +``` +CONTEXT CONTRIBUTION +Task: #{id} +Decisions: +- {key decision made during implementation} +Patterns: +- {pattern discovered or followed} +Insights: +- {useful information for other tasks} +Issues: +- {problems encountered, workarounds applied} +``` + +#### Context Manager → Task Executors (via SendMessage) + +``` +SESSION CONTEXT +Wave: {N} + +PROJECT SETUP: +{summarized tech stack, build commands, environment} + +CONVENTIONS: +{coding style, naming, import patterns discovered in prior waves} + +KEY DECISIONS: +{architecture choices from prior waves} + +KNOWN ISSUES: +{problems encountered, workarounds to be aware of} +``` + +### 7.5 Session Directory Layout + +``` +.claude/sessions/__live_session__/ +├── execution_context.md # Cross-wave learning (grouped by wave) +├── task_log.md # Per-task status table +└── session_summary.md # Final report (written in Step 8) + +.claude/sessions/{session-id}/ # Archived sessions +├── execution_context.md +├── task_log.md +└── session_summary.md +``` + +#### execution_context.md Format + +```markdown +# Execution Context + +## Wave 1 +**Completed**: 2026-02-23T14:30:22Z +**Tasks**: #1 (PASS), #2 (PASS), #3 (FAIL) + +### Learnings +- Runtime: Node.js 22 with pnpm +- Tests: `__tests__/{name}.test.ts` alongside source +- Imports: Named exports, barrel files for public API + +### Key Decisions +- [Task #1] Used Zod for runtime validation over io-ts +- [Task #2] Placed shared types in `src/types/` directory + +### Issues +- Vitest mock.calls behavior differs from Jest — reset between tests + +--- + +## Wave 2 +**Completed**: 2026-02-23T14:45:10Z +**Tasks**: #4 (PASS), #5 (PASS) + +### Learnings +- API routes follow `src/api/{resource}/route.ts` pattern + +### Key Decisions +- [Task #4] Used middleware pattern for auth validation + +### Issues +- None +``` + +#### task_log.md Format + +```markdown +# Task Log + +| Task | Subject | Status | Attempts | Duration | Tokens | +|------|---------|--------|----------|----------|--------| +| #1 | Create data models | PASS | 1 | 2m 10s | 52K | +| #2 | Implement API handler | PASS | 1 | 3m 01s | 67K | +| #3 | Add validation | FAIL | 2 | 4m 12s | 71K | +| #4 | Create auth middleware | PASS | 1 | 2m 45s | 48K | +``` + +### 7.6 Orchestration Loop Detail + +#### Step 1: Load & Filter Tasks + +``` +Input: TaskList + CLI args (--task-group, --phase, task-id) +Output: Filtered task set +Exit: If no tasks match filters + +Filter sequence: +1. --task-group → match metadata.task_group +2. --phase → match metadata.spec_phase (comma-separated integers) +3. task-id → single task mode + +Tasks without spec_phase metadata excluded when --phase is active. +``` + +#### Step 2: Validate State + +``` +Input: Filtered task set +Output: Validation result +Exit: If empty, all completed, or no unblocked tasks + +Checks: +- Empty task list → suggest /create-tasks +- All completed → report summary +- No unblocked tasks → report blocking chains +- Circular dependencies → detect and report +``` + +#### Step 3: Build Execution Plan + +``` +Input: Task dependencies, max_parallel setting +Output: Wave assignments with priority ordering + +Procedure: +3a. Resolve max_parallel: CLI > settings > default (5) +3b. Topological wave assignment: + - Wave 1: tasks with no blockedBy + - Wave N: tasks whose ALL blockedBy are in waves 1..N-1 +3c. Within-wave priority sort: + 1. critical > high > medium > low > unprioritized + 2. Ties: "unblocks most others" first +3d. Circular dependency breaking: weakest link (fewest blockers) +``` + +#### Step 4: Check Settings + +``` +Input: .claude/agent-alchemy.local.md +Output: Execution preferences (max_parallel, max_retries, model overrides) +Non-blocking: proceeds with defaults if file missing +``` + +#### Step 5: Present Plan & Confirm + +``` +Input: Execution plan +Output: User confirmation + +Display: +- Total task count, wave count +- Per-wave breakdown with task subjects and priorities +- Agent model tiers +- Estimated team composition per wave + +AskUserQuestion: "Proceed with execution?" / "Cancel" +``` + +#### Step 6: Initialize Session + +``` +Input: Task group, timestamp +Output: Session directory with initial files + +Procedure: +1. Generate session ID: {task-group}-{YYYYMMDD}-{HHMMSS} +2. Check for existing __live_session__/ content: + - If found: offer resume or fresh start via AskUserQuestion + - Resume: reset in_progress tasks to pending, continue + - Fresh start: archive to .claude/sessions/interrupted-{timestamp}/ +3. Create __live_session__/ with: + - execution_context.md (empty template) + - task_log.md (header only) +``` + +#### Step 7: Execute Waves + +``` +For each wave: + 7a. Identify unblocked tasks (refresh via TaskList) + 7b. Create wave team via TeamCreate + 7c. Spawn wave-lead agent with wave assignment in prompt + 7d. Wait for wave-lead summary via SendMessage (foreground Task) + 7e. Process wave summary: + - Update task_log.md with results + - Handle failed tasks requiring user escalation + - Emit progress events (if hooks enabled) + 7f. Repeat until no more unblocked tasks +``` + +#### Step 8: Session Summary + +``` +Input: task_log.md, execution_context.md +Output: session_summary.md, archived session + +Summary includes: +- Total pass/fail/partial counts +- Total execution time +- Per-wave breakdown +- Failed task list with reasons +- Key decisions made during execution + +Archive: Move __live_session__/ contents to .claude/sessions/{session-id}/ +``` + +#### Step 9: Update CLAUDE.md + +``` +Input: execution_context.md +Output: CLAUDE.md edits (if warranted) + +Only update if meaningful project-wide changes occurred: +- New dependencies added +- New patterns established +- Architecture decisions made +- New commands or build steps discovered +``` + +### 7.7 Technical Constraints + +| Constraint | Impact | Mitigation | +|------------|--------|------------| +| Claude Code API rate limits | Rapid agent spawning may be throttled | Staggered spawning with backoff in wave-lead | +| TeamCreate is relatively new | Potential undocumented limitations | Graceful fallback patterns; test extensively | +| SendMessage delivery is not guaranteed instant | Small delays between agent sends | Wave-lead uses polling pattern (check for messages, process, check again) | +| Agent context window limits | Large tasks may exceed context | Context Manager provides concise summaries; task descriptions should be bounded | +| Max concurrent agents | Platform may limit total active agents | Wave-lead respects max_parallel hint; orchestrator runs waves sequentially | + +## 8. Scope Definition + +### 8.1 In Scope + +- Full rewrite of orchestration skill (`execute-tasks` SKILL.md + references) +- New agent definitions: wave-lead, context-manager, task-executor (revised) +- Session directory management (simplified) +- Configuration system (4 settings) +- Progress reporting hooks (wave-level events) +- Auto-approve hook (simplified) +- Dry-run mode +- Phase filtering (`--phase`) +- Task group filtering (`--task-group`) +- Session recovery (basic — resume or fresh start) +- Backwards compatibility with task JSON format from `/create-tasks` + +### 8.2 Out of Scope + +- **Changes to `/create-spec` or `/create-tasks`**: These skills are untouched +- **Changes to task JSON format**: Tasks use existing structure with `blockedBy`, metadata, etc. +- **`produces_for` upstream injection**: Dropped — context manager handles information flow +- **File conflict detection**: Dropped — wave-lead coordinates via messages +- **Concurrent session support**: Still single-session per project +- **Per-task streaming progress**: Wave-level events only +- **Task-manager dashboard changes**: Dashboard reads existing task state; progress hooks are additive + +### 8.3 Future Considerations + +- **Cross-wave-lead communication**: For very large specs, wave-leads could share learnings directly instead of going through the orchestrator +- **Adaptive model tiering**: Automatically downgrade executor model for simple tasks based on complexity classification +- **Persistent context manager**: A single context manager that survives across waves, maintaining session state without file I/O +- **Parallel wave execution**: Run independent wave branches concurrently (requires dependency graph analysis beyond linear topological sort) + +## 9. Implementation Plan + +### 9.1 Phase 1: Foundation (Orchestrator Loop + Session Management) + +**Completion Criteria**: Orchestrator can load tasks, build a plan, present it to the user, create a session directory, and produce a session summary — without executing any waves. + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Orchestration skill | New `SKILL.md` with steps 1-6, 8-9 | Write skill with plan/confirm flow, session init, summary generation | None | +| Orchestration reference | New `references/orchestration.md` with step details | Document all step procedures | SKILL.md structure | +| Session management | Init, recovery detection, archival | Create/archive session dirs, interrupted session handling | None | +| Configuration reader | Settings from `.claude/agent-alchemy.local.md` | Parse YAML frontmatter for 4 settings | None | +| Dry-run mode | `--dry-run` flag implementation | Skip Step 7, display plan details only | Orchestration skill | + +**Checkpoint Gate**: +- [ ] `--dry-run` mode works end-to-end (load tasks → plan → display → exit) +- [ ] Session directory is created with correct structure +- [ ] Interrupted session is detected and user is prompted +- [ ] Configuration settings are read and applied + +--- + +### 9.2 Phase 2: Wave Execution (Wave Lead + Task Executors) + +**Completion Criteria**: Waves execute via team-based coordination. Wave-lead spawns executors, collects results, and reports to orchestrator. Basic retry works. + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Wave-lead agent | `agents/wave-lead.md` definition | Define agent prompt, model, tools | Phase 1 | +| Task executor agent | Revised `agents/task-executor.md` | 4-phase workflow with SendMessage protocol | Phase 1 | +| Wave dispatch | Orchestrator Step 7 implementation | TeamCreate per wave, wave-lead prompt construction, summary reception | Phase 1 + agents | +| Structured protocol | Result message format | Define and document executor → wave-lead message format | Agent definitions | +| Task state management | Wave-lead TaskUpdate integration | Wave-lead marks tasks in_progress/completed/failed | Agent definitions | +| 2-tier retry | Immediate retry + user escalation | Wave-lead retry logic, orchestrator escalation flow | Wave dispatch | + +**Checkpoint Gate**: +- [ ] Single-wave execution works end-to-end (spawn team → executors implement → results collected → summary reported) +- [ ] Multi-wave execution works (sequential waves with dependency ordering) +- [ ] Failed executor triggers immediate retry by wave-lead +- [ ] User escalation works for persistent failures + +--- + +### 9.3 Phase 3: Context System (Context Manager + Cross-Wave Learning) + +**Completion Criteria**: Context is distributed to executors at wave start, collected during execution, and persisted to `execution_context.md` for cross-wave learning. + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Context manager agent | `agents/context-manager.md` definition | Define agent prompt, model, tools | Phase 2 | +| Context distribution | Context manager → executor flow | Read execution_context.md, summarize, distribute via SendMessage | Context manager agent | +| Context collection | Executor → context manager flow | Receive contributions during wave, aggregate | Context manager agent | +| Context persistence | Write to execution_context.md | Wave-grouped format, append new wave section | Context distribution | +| Cross-wave bridge | Orchestrator passes context to wave-leads | Include execution_context.md summary in wave-lead prompt | Phase 2 + context persistence | + +**Checkpoint Gate**: +- [ ] Context manager distributes session summary to executors before they begin work +- [ ] Executors send context contributions to context manager during execution +- [ ] `execution_context.md` is updated with wave-grouped learnings after each wave +- [ ] Later waves receive context from earlier waves via context manager + +--- + +### 9.4 Phase 4: Resilience (Crash Recovery + Timeouts + Rate Limits) + +**Completion Criteria**: The engine handles wave-lead crashes, executor timeouts, and API rate limits without user intervention (except escalation). + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Wave-lead crash recovery | Automatic detection and retry | TaskOutput monitoring, task reset, new team spawn | Phase 2 | +| Per-task timeouts | Complexity-based timeout management | Wave-lead tracks executor duration, terminates on timeout | Phase 2 | +| Rate limit handling | Staggered spawning with backoff | Wave-lead implements spawn delays, retry on rate limit errors | Phase 2 | +| Context manager crash handling | Graceful degradation | Wave-lead detects, executors proceed without distributed context | Phase 3 | + +**Checkpoint Gate**: +- [ ] Simulated wave-lead crash triggers automatic recovery (new team spawned) +- [ ] Executor exceeding timeout is terminated and retried +- [ ] Rate limit during spawning triggers backoff (not crash) +- [ ] Context manager crash doesn't block wave execution + +--- + +### 9.5 Phase 5: Integration (Hooks + Dashboard + Polish) + +**Completion Criteria**: Progress hooks emit wave-level events, auto-approve hook works, and the engine is fully documented. + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Auto-approve hook | Simplified session write approval | Rewrite `auto-approve-session.sh` for new session structure | Phase 1 | +| Progress hooks | Wave-level event emission | Create hook that writes progress events to session dir | Phase 2 | +| Task log integration | Orchestrator updates task_log.md | Wave summary → task_log.md rows after each wave | Phase 2 | +| Documentation | Updated CLAUDE.md entries | Document new architecture, agents, configuration | All phases | +| Migration guide | Current → new engine transition | Document breaking changes, new file structure, removed features | All phases | + +**Checkpoint Gate**: +- [ ] Auto-approve hook allows autonomous session writes +- [ ] Progress events are emitted for wave start/complete and session start/complete +- [ ] task_log.md is populated with per-task results +- [ ] CLAUDE.md reflects the new architecture +- [ ] Migration guide covers all breaking changes + +## 10. Testing Strategy + +### 10.1 Test Approach + +Given that this is a Claude Code plugin (markdown-as-code), traditional unit testing doesn't apply. Testing focuses on scenario-based verification and dry-run validation. + +| Level | Scope | Method | Coverage Target | +|-------|-------|--------|-----------------| +| Agent scenarios | Individual agent behavior | Execute agents in isolation with controlled inputs | All 3 agent types | +| Integration | Full wave lifecycle | Execute single-wave sessions with known task sets | Happy path + all failure modes | +| Regression | Multi-wave sessions | Execute multi-wave specs end-to-end | 5+ session runs without failure | +| Dry-run | Plan validation | `--dry-run` flag verifies plan without execution | All filter combinations | + +### 10.2 Test Scenarios + +#### Scenario: Happy Path (Single Wave) + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create 3 tasks with no dependencies | Tasks created | +| 2 | Run `/execute-tasks` | Plan shows 1 wave with 3 tasks | +| 3 | Confirm execution | Wave team spawned | +| 4 | Wait for completion | All 3 tasks pass, session summary generated | + +#### Scenario: Multi-Wave with Dependencies + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create 5 tasks: A, B (blocked by A), C, D (blocked by B, C), E (blocked by D) | Tasks created with dependency chain | +| 2 | Run `/execute-tasks` | Plan shows 3 waves: [A, C], [B], [D], [E] | +| 3 | Confirm and execute | Waves execute sequentially, context flows between waves | + +#### Scenario: Executor Failure + Retry + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create task with acceptance criteria that executor will fail on first attempt | Task created | +| 2 | Execute | Executor fails, wave-lead retries immediately | +| 3 | Observe retry | Either succeeds (task marked completed) or fails again (escalated to user) | + +#### Scenario: Wave-Lead Crash Recovery + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create tasks that will trigger a wave | Tasks created | +| 2 | Simulate wave-lead crash (agent timeout) | Orchestrator detects crash | +| 3 | Observe recovery | Tasks reset to pending, new wave team spawned | + +#### Scenario: Phase Filtering + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create tasks with spec_phase metadata (phases 1, 2, 3) | Tasks created | +| 2 | Run `/execute-tasks --phase 1` | Only phase 1 tasks appear in plan | +| 3 | Execute and verify | Phase 1 tasks execute, phases 2-3 remain pending | + +### 10.3 Dry-Run Validation + +The dry-run mode serves as a lightweight test harness: + +``` +/execute-tasks --dry-run +/execute-tasks --dry-run --phase 1 +/execute-tasks --dry-run --task-group auth-feature +``` + +Each invocation should display the plan without modifying any state or spawning agents. Use this to verify plan generation logic before running full executions. + +## 11. Deployment & Operations + +### 11.1 Deployment Strategy + +This is a plugin skill replacement — the new engine replaces the existing `execute-tasks` skill files within the `sdd-tools` plugin. + +**Deployment steps**: +1. Replace `skills/execute-tasks/SKILL.md` with new orchestration skill +2. Replace `skills/execute-tasks/references/orchestration.md` with new reference +3. Remove `skills/execute-tasks/scripts/` directory (shell scripts eliminated) +4. Remove `skills/execute-tasks/references/execution-workflow.md` (replaced by new architecture) +5. Keep `skills/execute-tasks/references/verification-patterns.md` (still relevant for executor verification) +6. Add new agents: `agents/wave-lead.md`, `agents/context-manager.md` +7. Replace `agents/task-executor.md` with revised version +8. Update `hooks/hooks.json` with new hook configuration +9. Replace hook scripts with new versions +10. Update plugin version in `plugin.json` + +**Rollback plan**: Previous skill files are committed in git. Rollback = `git checkout` the prior version of the `sdd-tools` plugin directory. + +### 11.2 Hook Configuration + +```json +{ + "hooks": { + "PreToolUse": [{ + "matcher": "Write|Edit", + "hooks": [{ + "type": "command", + "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/auto-approve-session.sh", + "timeout": 5 + }] + }], + "PostToolUse": [{ + "matcher": "Write", + "hooks": [{ + "type": "command", + "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/progress-event.sh", + "timeout": 5 + }] + }] + } +} +``` + +### 11.3 Monitoring + +Progress events are emitted to the session directory and can be consumed by: +- **Task-manager dashboard**: Chokidar watches for progress event files +- **CLI output**: Orchestrator displays wave summaries between waves +- **Session logs**: `task_log.md` provides post-hoc debugging + +## 12. Dependencies + +### 12.1 Technical Dependencies + +| Dependency | Status | Risk if Unavailable | +|------------|--------|---------------------| +| Claude Code `TeamCreate` | Available | Critical — core architecture depends on this | +| Claude Code `SendMessage` | Available | Critical — all agent coordination uses this | +| Claude Code `TaskList`/`TaskUpdate` | Available | Critical — task state management | +| Claude Code `TaskOutput`/`TaskStop` | Available | High — crash detection and timeout enforcement | +| `.claude/agent-alchemy.local.md` | Optional | Low — defaults used if missing | + +### 12.2 Cross-Plugin Dependencies + +| Plugin | Dependency | Impact | +|--------|------------|--------| +| `create-tasks` (sdd-tools) | Task JSON format compatibility | Tasks must have same `blockedBy`, `metadata.task_group`, `metadata.spec_phase` structure | +| `core-tools` | No direct dependency | None (unlike current engine, no shell scripts to share) | +| `tdd-tools` | `execute-tdd-tasks` routes TDD tasks to different executor | TDD executor routing may need updates for team model | + +## 13. Risks & Mitigations + +| Risk | Impact | Likelihood | Mitigation Strategy | +|------|--------|------------|---------------------| +| TeamCreate API instability | High | Low | Test extensively; implement retry on team creation failure | +| SendMessage delivery delays | Medium | Medium | Wave-lead uses patient polling pattern; per-task timeouts catch stuck cases | +| Higher API cost from 3-tier agents | Medium | High | Default wave-lead to Opus (configurable to Sonnet); context manager uses Sonnet; monitor token usage per session | +| Context Manager produces poor summaries | Medium | Medium | Context manager uses Sonnet (strong summarization); orchestrator also bridges context directly in wave-lead prompt as backup | +| Wave-lead agent prompt too complex | Medium | Medium | Keep wave-lead instructions focused; externalize complex logic into reference files | +| Rate limit issues with parallel agent spawning | Medium | High | Staggered spawning with backoff built into wave-lead | +| `execute-tdd-tasks` compatibility | Low | Medium | Update TDD execution skill to use new team model in a follow-up | + +## 14. Open Questions + +| # | Question | Owner | Resolution | +|---|----------|-------|------------| +| 1 | Should agents running with `bypassPermissions` eliminate the need for the auto-approve hook? | Implementation | Test during Phase 5 — if bypassPermissions covers session writes, the hook is unnecessary | +| 2 | What is the maximum number of concurrent agents Claude Code supports in a single team? | Implementation | Test during Phase 2 — may affect max_parallel recommendations | +| 3 | How does `execute-tdd-tasks` adapt to the team model? | Follow-up | TDD skill routes TDD tasks to tdd-executor and non-TDD to task-executor — needs update for team-based dispatch | + +## 15. Appendix + +### 15.1 Glossary + +| Term | Definition | +|------|------------| +| Wave | A group of tasks that can execute in parallel (same topological sort level) | +| Wave Lead | The team-lead agent responsible for managing all executors within a single wave | +| Context Manager | A specialized team-member agent responsible for distributing and collecting execution context within a wave | +| Task Executor | A team-member agent that implements a single task using a 4-phase workflow | +| Orchestrator | The skill running in the user's conversation context that coordinates waves sequentially | +| Structured Protocol | The defined message format for inter-agent communication via SendMessage | +| Session | A single execution run covering one or more waves, producing session artifacts | +| Escalation | The process of reporting a persistent failure to the user for manual resolution | + +### 15.2 References + +- Current orchestration engine deep-dive: `internal/docs/sdd-orchestration-deep-dive-2026-02-22.md` +- Current execute-tasks skill: `claude/sdd-tools/skills/execute-tasks/` +- Current task-executor agent: `claude/sdd-tools/agents/task-executor.md` +- Claude Code Agent Team documentation: TeamCreate, SendMessage, TaskOutput tools + +### 15.3 Change Log + +| Version | Date | Author | Changes | +|---------|------|--------|---------| +| 1.0 | 2026-02-23 | Stephen Sequenzia | Initial version | + +--- + +*Document generated by SDD Tools* diff --git a/internal/specs/sdd-execute-tasks-rewrite-clean-SPEC.md b/internal/specs/sdd-execute-tasks-rewrite-clean-SPEC.md new file mode 100644 index 0000000..58c4f0b --- /dev/null +++ b/internal/specs/sdd-execute-tasks-rewrite-clean-SPEC.md @@ -0,0 +1,1510 @@ +# SDD Run-Tasks Engine PRD + +**Version**: 1.0 +**Author**: Stephen Sequenzia +**Date**: 2026-02-23 +**Status**: Draft +**Spec Type**: New Feature +**Spec Depth**: Full Technical Documentation +**Description**: A brand new execution engine skill (`/run-tasks`) for the sdd-tools plugin that replaces the current `execute-tasks` skill. Uses Claude Code's native Agent Team system (TeamCreate/SendMessage) with a 3-tier agent hierarchy, eliminating all file-based signaling, shell scripts, and complex merge pipelines. + +--- + +## 1. Executive Summary + +The `/run-tasks` skill is a new, independent execution engine for the Spec-Driven Development pipeline. It takes a set of tasks with dependency relationships (produced by `/create-tasks`) and executes them autonomously via parallel agent teams organized in waves. The engine uses Claude Code's native Agent Team primitives (`TeamCreate`, `SendMessage`, `TaskOutput`) for all coordination, replacing the current engine's unreliable file-based signaling architecture with message-passing. A 3-tier agent hierarchy (Orchestrator → Wave Lead → Context Manager + Task Executors) provides clean separation of concerns: the orchestrator plans and presents, wave-leads coordinate execution, context managers handle knowledge flow, and executors implement code. + +## 2. Problem Statement + +### 2.1 The Problem + +The current SDD execution engine (`/execute-tasks`, version 0.3.1) is too complex and unstable for production use. Three root causes drive this instability: + +1. **File-based signaling unreliability**: The engine uses `fswatch`/`inotifywait` via shell scripts to detect when agents complete work by watching for result files. These filesystem watchers miss events, deliver duplicates, and suffer from platform-specific inconsistencies — causing silent hangs, partial wave completion, and cascading timeouts. + +2. **Architectural complexity**: The engine spans ~2,600 lines across 10+ files, including a 10-step orchestration loop (~1,235 lines), two shell scripts for completion detection (248 lines combined), a PostToolUse validation hook, a 6-section context merge pipeline with compaction and deduplication, and a 3-tier retry escalation system with batched processing. Any change requires understanding the interaction between all these components. + +3. **Context window pressure**: The entire orchestration loop runs in the user's conversation context, consuming the user's context window with verbose wave summaries, file reads during context merging, progress streaming, and session file manipulation across multi-wave executions. + +### 2.2 Current State + +The current engine operates as follows: + +- **Orchestration**: A 10-step loop defined across `SKILL.md` (293 lines) and `references/orchestration.md` (~1,235 lines) +- **Agent launching**: Task executors spawned as background `Task` agents with `run_in_background: true` +- **Completion detection**: `watch-for-results.sh` (116 lines, fswatch/inotifywait) with fallback to `poll-for-results.sh` (134 lines, adaptive 5s-30s polling) +- **Result protocol**: Each agent writes `result-task-{id}.md` (~18 lines); a PostToolUse hook (`validate-result.sh`, 101 lines) validates format on write, renames malformed files to `.invalid` +- **Context sharing**: Per-task `context-task-{id}.md` files merged into shared `execution_context.md` using a 6-section structured schema with section-based parsing, deduplication, compaction at 10+ entries, and post-merge validation with auto-repair +- **Retry**: 3-tier escalation (Standard → Context Enrichment → User Escalation) with batched processing +- **Concurrency**: `.lock` file prevents concurrent sessions; file conflict detection defers tasks modifying the same files + +**Key files being replaced:** + +| File | Lines | Role | +|------|-------|------| +| `skills/execute-tasks/SKILL.md` | 293 | Skill entry point | +| `skills/execute-tasks/references/orchestration.md` | ~1,235 | 10-step orchestration loop | +| `skills/execute-tasks/references/execution-workflow.md` | ~381 | Execution workflow reference | +| `agents/task-executor.md` | 414 | Opus-tier task agent | +| `skills/execute-tasks/scripts/watch-for-results.sh` | 116 | Event-driven completion detection | +| `skills/execute-tasks/scripts/poll-for-results.sh` | 134 | Polling fallback | +| `hooks/auto-approve-session.sh` | 75 | PreToolUse auto-approval | +| `hooks/validate-result.sh` | 101 | PostToolUse result validation | + +### 2.3 Impact Analysis + +The instability of the execution engine directly blocks the SDD pipeline: + +- **Silent hangs**: The orchestrator waits indefinitely for result files that were written but not detected by fswatch +- **Partial wave completion**: Some agents' results are detected, others are missed, causing inconsistent state +- **Cascading timeouts**: The 8-minute Bash timeout for detection scripts triggers complex recovery paths +- **Context corruption**: Failed merges or partial writes to `execution_context.md` degrade context quality for subsequent waves +- **Maintenance burden**: Any change requires understanding the interaction between the 10-step loop, shell scripts, hook validation, and file-based protocols — a cognitive load that inhibits iteration + +### 2.4 Business Value + +The execution engine is the terminal artifact in the SDD pipeline (`/create-spec` → spec → `/create-tasks` → tasks → `/run-tasks` → code). If execution is unreliable, the entire pipeline's value proposition — autonomous code generation from specifications — is undermined. A stable, simpler engine enables confident multi-wave execution of complex specs, which is the primary use case for the SDD tools plugin. + +## 3. Goals & Success Metrics + +### 3.1 Primary Goals + +1. **Replace file-based signaling with message-based coordination** using Claude Code's native Agent Team system (`TeamCreate`, `SendMessage`, `TaskOutput`) +2. **Reduce architectural complexity** by eliminating shell scripts, file-based protocols, and the 6-section merge pipeline +3. **Improve resilience** with a 3-tier retry model, automatic wave-lead crash recovery, per-task timeouts, and graceful degradation under API rate limits +4. **Reduce context window pressure** by delegating wave management to dedicated wave-lead agents rather than running everything in the user's context +5. **Maintain functional parity** for the end user — task filtering, session artifacts, and execution flow remain familiar + +### 3.2 Success Metrics + +| Metric | Current Baseline | Target | Measurement Method | +|--------|------------------|--------|-------------------| +| Completion detection reliability | Intermittent failures (fswatch misses) | 100% (message-based) | Execute 10 multi-wave sessions without detection failure | +| Shell script dependencies | 2 scripts (248 lines) | 0 scripts | File inventory | +| Wave execution success rate | ~80% (estimated from retry patterns) | > 95% first-attempt pass rate | Session logs across 20 executions | +| New failure recovery modes | 0 (no wave-lead crash handling) | 3 (wave-lead crash retry + per-task timeout + context-enriched retry) | Feature verification | + +### 3.3 Non-Goals + +- **Changing the task format**: Tasks produced by `/create-tasks` remain compatible — same JSON structure, same `blockedBy` relationships, same metadata fields +- **Changing the spec format**: The input spec format is untouched — this affects execution only +- **TDD task routing**: The `/execute-tdd-tasks` skill has its own execution pipeline; adapting it to the new engine is a separate follow-up spec +- **Task Manager dashboard compatibility**: The dashboard update is a separate follow-up +- **Real-time per-task streaming**: Wave-level progress events are sufficient; per-line code generation streaming is out of scope +- **Multi-session concurrency**: Only one execution session at a time per project (same as current) + +## 4. User Research + +### 4.1 Target Users + +#### Primary Persona: SDD Pipeline User + +- **Role/Description**: Developer using the full SDD pipeline (`/create-spec` → `/create-tasks` → `/run-tasks`) to generate code from specifications +- **Goals**: Execute a set of tasks autonomously with minimal intervention, verify results, and iterate +- **Pain Points**: Execution hangs on completion detection, unclear error messages when waves fail, excessive session artifacts to debug, context window filling up during long sessions +- **Context**: Invokes `/run-tasks` after task generation, monitors progress, intervenes only on escalation +- **Technical Proficiency**: High — understands task dependencies, wave parallelism, and agent coordination + +#### Secondary Persona: Plugin Developer + +- **Role/Description**: Developer maintaining or extending the SDD tools plugin +- **Goals**: Modify orchestration behavior, add new features, debug execution issues +- **Pain Points**: Current architecture requires understanding 10+ files and the interaction between shell scripts, hooks, and file protocols +- **Context**: Reads and modifies skill files, agent definitions, and hook scripts + +### 4.2 User Journey Map + +``` +[Tasks created] --> [/run-tasks] --> [Review plan] --> [Confirm] --> [Monitor waves] --> [Handle escalations] --> [Review results] + | | | | | | | + v v v v v v v + Task JSON Load & plan Wave breakdown "Proceed?" Progress events Fix/skip/guide/abort Session summary +``` + +### 4.3 User Workflows + +#### Workflow 1: Standard Execution + +```mermaid +flowchart TD + classDef user fill:#E3F2FD,stroke:#1565C0,color:#000 + classDef system fill:#E8F5E9,stroke:#2E7D32,color:#000 + classDef decision fill:#FFF3E0,stroke:#E65100,color:#000 + + START[User: /run-tasks]:::user + PLAN[Orchestrator: Load, validate, plan]:::system + CONFIRM{User: Approve plan?}:::decision + DRYRUN{--dry-run?}:::decision + WAVE[Create wave team]:::system + LEAD[Wave-lead: Manage executors]:::system + RESULTS[Wave-lead: Report summary]:::system + MORE{More waves?}:::decision + SUMMARY[Session summary]:::system + + START --> PLAN --> CONFIRM + CONFIRM -->|Yes| DRYRUN + CONFIRM -->|No| END[Cancel]:::user + DRYRUN -->|Yes| DISPLAY[Display plan & exit]:::system + DRYRUN -->|No| WAVE + WAVE --> LEAD --> RESULTS --> MORE + MORE -->|Yes| WAVE + MORE -->|No| SUMMARY +``` + +#### Workflow 2: Failure Escalation + +```mermaid +flowchart TD + classDef agent fill:#E3F2FD,stroke:#1565C0,color:#000 + classDef user fill:#FFF3E0,stroke:#E65100,color:#000 + classDef recover fill:#E8F5E9,stroke:#2E7D32,color:#000 + + FAIL[Executor reports failure]:::agent + RETRY1[Tier 1: Wave-lead immediate retry]:::agent + CHECK1{Succeeded?}:::agent + RETRY2[Tier 2: Retry with enriched context]:::agent + CHECK2{Succeeded?}:::agent + ESCALATE[Tier 3: Orchestrator asks user]:::user + FIX[Fix manually + continue]:::recover + SKIP[Skip task]:::recover + GUIDE[Provide guidance → guided retry]:::recover + ABORT[Abort session]:::user + + FAIL --> RETRY1 --> CHECK1 + CHECK1 -->|Yes| DONE[Continue wave]:::recover + CHECK1 -->|No| RETRY2 --> CHECK2 + CHECK2 -->|Yes| DONE + CHECK2 -->|No| ESCALATE + ESCALATE --> FIX + ESCALATE --> SKIP + ESCALATE --> GUIDE + ESCALATE --> ABORT +``` + +## 5. Functional Requirements + +### 5.1 Feature: 3-Tier Agent Hierarchy + +**Priority**: P0 (Critical) +**Complexity**: High + +#### User Stories + +**US-001**: As an SDD pipeline user, I want the execution engine to use Claude Code's native team coordination so that execution doesn't depend on unreliable filesystem watching. + +**Acceptance Criteria**: +- [ ] Each wave spawns a dedicated Agent Team via `TeamCreate` +- [ ] Wave-lead agent coordinates task executors via `SendMessage` (no file-based signaling) +- [ ] Context Manager agent per wave handles execution context distribution and collection +- [ ] Task executor agents use the 4-phase workflow (Understand, Implement, Verify, Report) +- [ ] All inter-agent communication uses `SendMessage` with structured protocols +- [ ] No shell scripts are required for execution coordination + +**Technical Notes**: +- Agent hierarchy: Orchestrator (skill) → Wave Lead (team lead) → Context Manager + Task Executor × N (team members) +- The orchestrator runs in the user's conversation context; wave teams run as spawned agents +- Each wave team is independent — no cross-wave team membership + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Wave with single task | Wave-lead still spawns context manager + one executor (consistent pattern) | +| Wave with 0 unblocked tasks after filtering | Skip wave, proceed to next (or finish) | +| All tasks in a wave fail | Wave-lead reports all failures; orchestrator presents batch escalation to user | + +**Error Handling**: + +| Error Condition | System Action | +|-----------------|---------------| +| TeamCreate fails | Orchestrator retries once; on second failure, marks wave tasks as failed and offers user the choice to retry or skip | +| SendMessage fails between agents | Agent retries delivery; on persistent failure, wave-lead logs the issue and marks affected task as failed | +| Task tool spawn fails | Wave-lead logs error, marks task as failed, continues with remaining executors | + +--- + +### 5.2 Feature: Wave Lead Agent + +**Priority**: P0 (Critical) +**Complexity**: High + +#### User Stories + +**US-002**: As an SDD pipeline user, I want each wave to be managed by an autonomous wave-lead agent so that wave execution is self-contained and recoverable. + +**Acceptance Criteria**: +- [ ] Wave-lead launches context manager as first team member +- [ ] Wave-lead launches task executor agents for each task in the wave +- [ ] Wave-lead manages pacing autonomously using `max_parallel` as a guideline (not a rigid cap) +- [ ] Wave-lead collects structured results from all executors via `SendMessage` +- [ ] Wave-lead manages TaskUpdate calls (marks tasks `in_progress`, `completed`, `failed`) as single source of truth +- [ ] Wave-lead handles Tier 1 retry (immediate) and Tier 2 retry (context-enriched) before escalating +- [ ] Wave-lead reports wave summary to orchestrator via `SendMessage` including: tasks passed, tasks failed, duration, key decisions +- [ ] Wave-lead implements staggered agent spawning with exponential backoff for rate limit protection +- [ ] Wave-lead model is configurable (default: Opus) + +**Technical Notes**: +- Wave-lead receives: task list for this wave, execution context snapshot, wave number, max_parallel hint, max_retries setting +- Wave-lead produces: wave summary message to orchestrator, TaskUpdate state changes +- Wave-lead lifecycle: created per wave, destroyed after wave completes (no persistent wave-leads) + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Executor finishes before others | Wave-lead acknowledges result immediately; does not wait for batch | +| All executors fail | Wave-lead reports all failures to orchestrator for user escalation | +| Rate limit hit during agent spawning | Staggered spawning with exponential backoff; partial team formation handled gracefully | +| Wave-lead itself crashes | Orchestrator detects via TaskOutput, resets wave tasks to pending, spawns new wave team | + +--- + +### 5.3 Feature: Context Manager Agent + +**Priority**: P0 (Critical) +**Complexity**: High + +#### User Stories + +**US-003**: As an SDD pipeline user, I want a dedicated context manager per wave so that execution context is intelligently summarized, distributed, and collected without complex file-based merge pipelines. + +**Acceptance Criteria**: +- [ ] Context manager reads main `execution_context.md` at wave start +- [ ] Context manager derives a relevant summary of session context up to the current wave +- [ ] Context manager distributes summary to all task executors via `SendMessage` +- [ ] Context manager signals wave-lead that context distribution is complete +- [ ] Task executors send key decisions, insights, and patterns back to context manager during execution +- [ ] Context manager summarizes collected information at wave end +- [ ] Context manager updates main `execution_context.md` with new wave section +- [ ] Context manager provides enriched context to wave-lead on request (for Tier 2 retry) +- [ ] Context manager model is configurable (default: Sonnet) + +**Technical Notes**: +- `execution_context.md` is organized by waves (not the old 6-section schema) +- Context manager has Read/Write access to the session directory +- Context manager is a team member (not the team lead) — wave-lead coordinates its lifecycle +- Context distribution happens before task executors begin work +- Mid-wave real-time relay between executors is aspirational — SendMessage is async, so executors may not receive within-wave updates until their current turn ends. The primary value is cross-wave knowledge persistence. + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Empty execution_context.md (first wave) | Context manager distributes minimal context: "This is the first wave. No prior context available." | +| Very large execution_context.md (many prior waves) | Context manager summarizes aggressively; includes only relevant patterns, decisions, and conventions | +| Context manager crashes | Wave-lead detects; executors proceed without distributed context; wave-lead writes a minimal context entry for the wave | +| Executor sends context update after context manager has already written | Context manager handles late arrivals if still alive; otherwise updates are lost (acceptable — not critical data) | + +--- + +### 5.4 Feature: Task Executor Agent (Revised) + +**Priority**: P0 (Critical) +**Complexity**: Medium + +#### User Stories + +**US-004**: As an SDD pipeline user, I want task executors to implement code changes using a 4-phase workflow and communicate results via structured messages so that execution quality is maintained without file-based protocols. + +**Acceptance Criteria**: +- [ ] Executors follow 4-phase workflow: Understand, Implement, Verify, Report +- [ ] Executors send structured result message to wave-lead via `SendMessage` +- [ ] Result message includes: status (PASS/PARTIAL/FAIL), summary, files_modified, verification_results, issues, context_contribution +- [ ] Executors send context contribution (decisions, patterns, insights) to context manager via separate `SendMessage` +- [ ] Executors use verification logic from `references/verification-patterns.md` +- [ ] Executor model is configurable (default: Opus) +- [ ] Executors operate with `bypassPermissions` mode for implementation autonomy + +**Technical Notes**: +- Executors are team members spawned by the wave-lead +- Each executor receives: task description, acceptance criteria, context summary (from context manager), and any relevant metadata +- The structured result protocol replaces the current `result-task-{id}.md` file format +- `verification-patterns.md` is copied from the existing engine (battle-tested logic for spec-generated vs. general task classification) + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Executor exceeds per-task timeout | Wave-lead terminates executor via `TaskStop`, marks task as failed, triggers retry | +| Executor produces PARTIAL result | Wave-lead treats as failure for retry purposes but preserves partial work context | +| Executor modifies unexpected files | Accepted — verification phase should catch unintended changes | + +--- + +### 5.5 Feature: 7-Step Orchestration Loop + +**Priority**: P0 (Critical) +**Complexity**: Medium + +#### User Stories + +**US-005**: As an SDD pipeline user, I want a streamlined orchestration loop so that execution is predictable and the codebase is maintainable. + +**Acceptance Criteria**: +- [ ] Step 1 (Load & Validate): Load TaskList, apply `--task-group` and `--phase` filters, validate state (empty, all completed, no unblocked, circular dependencies) +- [ ] Step 2 (Configure & Plan): Read settings from `.claude/agent-alchemy.local.md`, build execution plan via topological sort, wave assignment, priority ordering within waves +- [ ] Step 3 (Confirm): Present execution plan to user via `AskUserQuestion`, get approval. If `--dry-run`: display plan details and exit. +- [ ] Step 4 (Initialize Session): Create session directory, handle interrupted session recovery (offer resume or fresh start via `AskUserQuestion`) +- [ ] Step 5 (Execute Waves): For each wave: create team → wave-lead manages → collect summary → process results → handle escalations +- [ ] Step 6 (Summarize & Archive): Generate session_summary.md, archive session to timestamped directory +- [ ] Step 7 (Finalize): Review execution_context.md for project-wide changes, update CLAUDE.md if warranted + +**Technical Notes**: +- Steps 1-4 and 6-7 run in the orchestrator skill's prompt (user's context) +- Step 5 delegates to wave teams — orchestrator waits for each wave-lead's summary via foreground `Task` +- The orchestrator passes accumulated `execution_context.md` content to each wave-lead's prompt as cross-wave context bridge + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| User cancels at Step 3 | Clean exit, no tasks modified | +| All tasks already completed | Report summary at Step 1, no execution | +| Circular dependencies detected | Break at weakest link (fewest blockers), warn user in plan | +| `--phase 1,2` filtering | Execute tasks in spec phases 1 and 2 only; tasks without `spec_phase` excluded | +| `--dry-run` mode | Complete Steps 1-3 only; display plan details (wave breakdown, task assignments, model tiers) and exit without spawning agents or creating session directory | + +--- + +### 5.6 Feature: 3-Tier Retry Model + +**Priority**: P1 (High) +**Complexity**: Medium + +#### User Stories + +**US-006**: As an SDD pipeline user, I want a graduated retry model so that transient failures are recovered automatically, context-starved failures get enriched information, and persistent failures are escalated to me promptly. + +**Acceptance Criteria**: +- [ ] Tier 1 (Immediate Retry): Wave-lead immediately retries a failed executor (1 attempt by default, configurable via `max_retries`) +- [ ] Retry includes failure context from the original attempt +- [ ] Tier 2 (Context-Enriched Retry): Wave-lead requests additional context from Context Manager (related task results, detailed project context) and retries with enriched prompt +- [ ] Tier 3 (User Escalation): After retry exhaustion, wave-lead reports failure to orchestrator +- [ ] Orchestrator presents failure to user via `AskUserQuestion` with 4 options: Fix manually, Skip, Provide guidance, Abort session +- [ ] "Provide guidance" option triggers a guided retry with user-supplied instructions +- [ ] Guided retry failures re-prompt the user (loop until resolution) +- [ ] "Fix manually" waits for user confirmation that the fix is done, then marks task as completed (manual) + +**Technical Notes**: +- Retry is immediate per executor (not batched) — as soon as an executor reports failure, the wave-lead can retry while other executors are still running +- Escalation flows: Executor → Wave-lead (Tier 1 retry) → Wave-lead (Tier 2 enriched retry) → Wave-lead (escalate via SendMessage) → Orchestrator (present to user) → Orchestrator (relay decision to wave-lead) → Wave-lead (act on decision) +- The wave-lead continues managing other running executors during the escalation round-trip + +**Edge Cases**: + +| Scenario | Expected Behavior | +|----------|-------------------| +| Multiple executors fail simultaneously | Each is retried independently and immediately through Tier 1 → Tier 2 | +| Tier 1 retry succeeds | Wave-lead marks task as completed, continues normally | +| Tier 2 retry succeeds | Wave-lead marks task as completed, continues normally | +| User selects "Abort session" | Orchestrator signals wave-lead to terminate remaining executors; all remaining tasks logged as failed | +| User selects "Fix manually" | Orchestrator waits for user confirmation; marks task as completed (manual) | + +--- + +### 5.7 Feature: Wave-Lead Crash Recovery + +**Priority**: P1 (High) +**Complexity**: Medium + +#### User Stories + +**US-007**: As an SDD pipeline user, I want the orchestrator to automatically recover when a wave-lead agent crashes so that a single agent failure doesn't require restarting the entire session. + +**Acceptance Criteria**: +- [ ] Orchestrator monitors wave-lead via `TaskOutput` with appropriate timeout +- [ ] On wave-lead crash or timeout, orchestrator resets wave tasks that are still `in_progress` to `pending` (via TaskUpdate) +- [ ] Orchestrator spawns a new wave team for the reset tasks +- [ ] Recovery is automatic — no user intervention required unless the retry also fails +- [ ] If second wave-lead also crashes, orchestrator escalates to user via `AskUserQuestion` +- [ ] Tasks that were already completed by executors before the crash retain their completed status + +**Technical Notes**: +- "Crash" includes: agent timeout, unexpected termination, malformed summary response +- Only `in_progress` or `pending` tasks within the wave are reset +- Completed tasks are preserved because the wave-lead calls TaskUpdate to mark them completed before the crash occurs + +--- + +### 5.8 Feature: Per-Task Timeout Management + +**Priority**: P1 (High) +**Complexity**: Medium + +*Agent Recommendation — accepted during interview.* + +#### User Stories + +**US-008**: As an SDD pipeline user, I want per-task timeouts so that stuck executors are proactively terminated rather than blocking the entire wave. + +**Acceptance Criteria**: +- [ ] Wave-lead monitors each executor's duration +- [ ] Default timeout is complexity-based: XS/S tasks: 5 min, M tasks: 10 min, L/XL tasks: 20 min +- [ ] Complexity classification reads `metadata.complexity` field; tasks without complexity default to M (10 min) +- [ ] Timeout triggers proactive termination via `TaskStop` +- [ ] Timed-out tasks are treated as failures and enter the retry flow (Tier 1) +- [ ] Timeout values can be overridden per task via task metadata (`metadata.timeout_minutes`) + +**Technical Notes**: +- The wave-lead tracks start time for each executor and checks against timeout threshold +- Complexity values come from `create-tasks` output: XS, S, M, L, XL + +--- + +### 5.9 Feature: Rate Limit Protection + +**Priority**: P1 (High) +**Complexity**: Low + +*Agent Recommendation — accepted during interview.* + +#### User Stories + +**US-009**: As an SDD pipeline user, I want the engine to handle API rate limits gracefully so that spawning many agents doesn't crash the wave. + +**Acceptance Criteria**: +- [ ] Wave-lead implements staggered agent spawning (brief delay between launches) +- [ ] Rate limit errors during agent creation trigger retry with exponential backoff +- [ ] Partial team formation is handled — if some executors fail to spawn, wave-lead proceeds with those that succeeded and retries spawning the rest +- [ ] Spawning failures are logged to the wave summary + +**Technical Notes**: +- Default stagger delay: 1-2 seconds between spawns +- The Claude Code Task tool handles some rate limiting internally, but rapid parallel spawns of Opus-tier agents can still trigger limits + +--- + +### 5.10 Feature: Dry-Run Mode + +**Priority**: P2 (Medium) +**Complexity**: Low + +*Agent Recommendation — accepted during interview.* + +#### User Stories + +**US-010**: As an SDD pipeline user, I want a dry-run mode that doubles as a test harness so that I can validate the execution plan and team structure without spawning real agents. + +**Acceptance Criteria**: +- [ ] `--dry-run` flag causes the orchestrator to exit after Step 3 (Confirm) +- [ ] Dry-run output shows: wave breakdown, task assignments per wave, priority ordering, agent model tiers, estimated team composition per wave +- [ ] No tasks are modified (no TaskUpdate calls) +- [ ] No session directory is created +- [ ] Dry-run completes in seconds (no agent spawning) +- [ ] Dry-run validates the full plan generation pipeline (load, filter, validate, topological sort, wave assignment) as a lightweight test harness + +--- + +### 5.11 Feature: Session Management + +**Priority**: P1 (High) +**Complexity**: Low + +#### User Stories + +**US-011**: As an SDD pipeline user, I want simple session management with recovery so that interrupted sessions can be resumed without complex cleanup logic. + +**Acceptance Criteria**: +- [ ] Session ID generated from task-group + timestamp (e.g., `auth-feature-20260223-143022`) +- [ ] Session directory: `.claude/sessions/__live_session__/` +- [ ] Session artifacts: 5 files — `execution_context.md`, `task_log.md`, `session_summary.md`, `execution_plan.md`, `progress.jsonl` +- [ ] On interrupted session detection: offer user choice to resume or start fresh via `AskUserQuestion` +- [ ] Resume: reset `in_progress` tasks to pending, continue from next unblocked wave +- [ ] Fresh start: archive interrupted session to `.claude/sessions/interrupted-{timestamp}/`, create new session +- [ ] No `.lock` file — detection is based on presence of `__live_session__/` with content + +--- + +### 5.12 Feature: Auto-Approve Hook + +**Priority**: P2 (Medium) +**Complexity**: Low + +#### User Stories + +**US-012**: As an SDD pipeline user, I want session directory writes to be auto-approved so that context manager updates don't trigger permission prompts. + +**Acceptance Criteria**: +- [ ] PreToolUse hook auto-approves Write/Edit operations to the session directory (`*/.claude/sessions/*`) +- [ ] Hook covers `execution_context.md`, `task_log.md`, `session_summary.md`, `execution_plan.md`, `progress.jsonl` +- [ ] Hook never exits non-zero (safe error handling via `trap 'exit 0' ERR`) +- [ ] Hook has debug logging capability via environment variable + +**Technical Notes**: +- This is a safety net alongside `bypassPermissions` mode on agents +- Open question: if `bypassPermissions` covers all session writes for all agent types, this hook may be unnecessary — test during Phase 3 + +--- + +### 5.13 Feature: Configuration System + +**Priority**: P2 (Medium) +**Complexity**: Low + +#### User Stories + +**US-013**: As an SDD pipeline user, I want execution behavior to be configurable so that I can tune agent tiers and retry behavior for my project. + +**Acceptance Criteria**: +- [ ] Configuration read from `.claude/agent-alchemy.local.md` YAML frontmatter +- [ ] Configurable settings: + - `run-tasks.max_parallel` (default: 5) — hint to wave-lead for pacing + - `run-tasks.max_retries` (default: 1) — autonomous retries per tier before escalation + - `run-tasks.wave_lead_model` (default: `opus`) — model for wave-lead agents + - `run-tasks.context_manager_model` (default: `sonnet`) — model for context manager agents + - `run-tasks.executor_model` (default: `opus`) — model for task executor agents +- [ ] Missing settings file is not an error — defaults are used + +--- + +### 5.14 Feature: Progress Events + +**Priority**: P2 (Medium) +**Complexity**: Low + +#### User Stories + +**US-014**: As a developer monitoring execution, I want structured progress events so that external tools can track execution status. + +**Acceptance Criteria**: +- [ ] Progress events written to `progress.jsonl` in the session directory as JSON lines +- [ ] Events emitted: session_start, wave_start, wave_complete, task_complete, session_complete +- [ ] Each event includes: timestamp, event_type, and event-specific data (wave number, task ID, status, duration) +- [ ] Progress event writing is best-effort — failures do not affect execution +- [ ] Orchestrator writes session-level events; wave-lead writes wave-level events (if it has access) + +**Technical Notes**: +- JSON Lines format enables append-only writes without read-modify-write +- The task-manager dashboard can consume these events in a future update +- If wave-lead writes to the session directory, the auto-approve hook must cover those writes + +--- + +## 6. Non-Functional Requirements + +### 6.1 Performance Requirements + +| Metric | Requirement | Measurement Method | +|--------|-------------|-------------------| +| Wave setup time | < 30 seconds from wave start to all executors launched | Timestamp comparison in wave summary | +| Context distribution time | < 15 seconds from context manager start to all executors receiving context | Wave-lead tracking | +| Orchestrator overhead per wave | < 60 seconds (plan review, team creation, summary processing) | Session log timestamps | +| Total execution overhead | < 10% of total wall time spent on coordination vs. actual implementation | Session summary analysis | + +### 6.2 Reliability Requirements + +| Metric | Requirement | +|--------|-------------| +| Completion detection | 100% — message-based delivery eliminates detection failures | +| Wave-lead crash recovery | Automatic retry within 60 seconds of crash detection | +| Per-task timeout enforcement | Stuck executors terminated within 30 seconds of timeout | +| Session recovery | Resume from any interruption point without data loss | + +### 6.3 Scalability Requirements + +| Metric | Requirement | +|--------|-------------| +| Max tasks per session | 100+ (limited by API rate limits, not architecture) | +| Max tasks per wave | Limited only by `max_parallel` hint and API rate limits | +| Max waves per session | Unlimited (determined by dependency graph depth) | +| Context file growth | Linear with wave count; context manager summarizes to prevent unbounded growth | + +### 6.4 Maintainability Requirements + +| Metric | Requirement | +|--------|-------------| +| Shell script count | 0 (all coordination via Claude Code primitives) | +| Agent definition count | 3 new agents (wave-lead, context-manager, task-executor) | +| Hook count | 1 (auto-approve; progress is best-effort via direct writes) | + +## 7. Technical Architecture + +### 7.1 System Overview + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ User's Conversation Context │ +│ ┌─────────────────────────────────────────────────────────────┐ │ +│ │ Orchestrator Skill (/run-tasks) │ │ +│ │ Steps 1-4: Load, Configure, Confirm, Init Session │ │ +│ │ Step 5: Spawn wave teams (sequential) │ │ +│ │ Steps 6-7: Summarize, Finalize │ │ +│ └─────────────────────────┬───────────────────────────────────┘ │ +└────────────────────────────┼────────────────────────────────────────┘ + │ TeamCreate + SendMessage + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Wave Team (per wave) │ +│ │ +│ ┌───────────────────────────────────────────┐ │ +│ │ Wave Lead Agent (Opus) │ │ +│ │ - Launches context manager + executors │ │ +│ │ - Collects results via SendMessage │ │ +│ │ - Handles Tier 1 + Tier 2 retries │ │ +│ │ - Manages TaskUpdate state changes │ │ +│ │ - Reports wave summary to orchestrator │ │ +│ └────────┬──────────────────┬────────────────┘ │ +│ │ │ │ +│ ┌──────▼──────┐ ┌──────▼──────────────────────────┐ │ +│ │ Context │ │ Task Executors (Opus) × N │ │ +│ │ Manager │◄──►│ - 4-phase workflow │ │ +│ │ (Sonnet) │ │ - Structured result protocol │ │ +│ │ │ │ - Context contribution to CM │ │ +│ └──────────────┘ └─────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Session Directory │ +│ .claude/sessions/__live_session__/ │ +│ ├── execution_context.md (cross-wave learning, grouped by wave) │ +│ ├── task_log.md (per-task status, duration, tokens) │ +│ ├── session_summary.md (final execution report) │ +│ ├── execution_plan.md (wave breakdown for debugging) │ +│ └── progress.jsonl (structured progress events) │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### 7.2 Tech Stack + +| Layer | Technology | Justification | +|-------|------------|---------------| +| Orchestration | Claude Code Skill (markdown-as-code) | Existing plugin system; runs in user's context | +| Agent coordination | `TeamCreate` / `SendMessage` / `TaskOutput` | Native Claude Code primitives; message-passing replaces file-based signaling | +| Task state | `TaskList` / `TaskUpdate` / `TaskGet` | Native Claude Code task management; replaces custom state tracking | +| Agent spawning | `Task` tool with `team_name` parameter | Team-aware agent spawning | +| Agent termination | `TaskStop` | Per-task timeout enforcement | +| Session storage | Local filesystem (`.claude/sessions/`) | Persistent session artifacts for history and debugging | +| Configuration | YAML frontmatter in `.claude/agent-alchemy.local.md` | Existing settings convention | + +### 7.3 Agent Definitions + +#### Agent: Wave Lead (`agents/wave-lead.md`) + +```yaml +--- +model: opus # configurable via run-tasks.wave_lead_model +tools: + - Task + - TaskList + - TaskGet + - TaskUpdate + - TaskStop + - SendMessage + - Read + - Glob + - Grep +--- +``` + +**Responsibilities**: +1. Receive wave assignment (task list, max_parallel hint, max_retries, wave number) from orchestrator +2. Launch Context Manager agent as first team member +3. Wait for Context Manager to signal readiness (context distributed) +4. Launch Task Executor agents with staggered spawning (rate limit protection) +5. Mark each task `in_progress` via TaskUpdate before launching its executor +6. Monitor executor progress via SendMessage (collect structured results) +7. Handle Tier 1 retry (immediate) for failed executors +8. Handle Tier 2 retry (request enriched context from Context Manager) for persistent failures +9. Mark tasks `completed` or `failed` via TaskUpdate based on results +10. After all executors complete: signal Context Manager to finalize context, collect wave metrics +11. Send structured wave summary to orchestrator via SendMessage +12. Handle shutdown request from orchestrator + +#### Agent: Context Manager (`agents/context-manager.md`) + +```yaml +--- +model: sonnet # configurable via run-tasks.context_manager_model +tools: + - Read + - Write + - SendMessage + - Glob + - Grep +--- +``` + +**Responsibilities**: +1. Read `execution_context.md` from session directory +2. Derive a concise, relevant summary of all prior wave learnings +3. Distribute context summary to all task executors via SendMessage +4. Signal wave-lead that context distribution is complete +5. Receive context contributions from executors during execution (decisions, patterns, insights, issues) +6. On Tier 2 retry request from wave-lead: provide enriched context for the failing task (include related task results, detailed project context) +7. On wave completion signal from wave-lead: summarize all collected contributions +8. Append new wave section to `execution_context.md` +9. Handle shutdown request + +#### Agent: Task Executor (`agents/task-executor.md`) + +```yaml +--- +model: opus # configurable via run-tasks.executor_model +tools: + - Read + - Write + - Edit + - Glob + - Grep + - Bash + - SendMessage +--- +``` + +**Responsibilities** (4-phase workflow): +1. **Understand**: Read task description, acceptance criteria, and distributed context. Analyze requirements. Explore codebase if needed. +2. **Implement**: Make code changes (Write, Edit, Bash). Follow project conventions from context. +3. **Verify**: Check acceptance criteria using `references/verification-patterns.md` logic. Run tests if applicable. Classify result as PASS/PARTIAL/FAIL. +4. **Report**: Send structured result to wave-lead via SendMessage. Send context contribution to context manager via SendMessage. + +### 7.4 Communication Protocols + +All inter-agent communication uses `SendMessage` with explicit schemas. These schemas are defined in `references/communication-protocols.md`. + +#### Orchestrator → Wave Lead (via Task prompt) + +``` +WAVE ASSIGNMENT +Wave: {N} of {total} +Max Parallel: {max_parallel} +Max Retries: {max_retries} +Session Dir: {session_dir_path} + +TASKS: +- Task #{id}: {subject} + Description: {description} + Acceptance Criteria: {criteria} + Priority: {priority} + Complexity: {complexity} + Metadata: {metadata} + +CROSS-WAVE CONTEXT: +{Summary of execution_context.md content for context bridge} +``` + +#### Wave Lead → Orchestrator (via SendMessage) + +``` +WAVE SUMMARY +Wave: {N} +Duration: {total_wave_duration} +Tasks Passed: {count} +Tasks Failed: {count} +Tasks Skipped: {count} + +RESULTS: +- Task #{id}: {status} ({duration}) + Summary: {brief} + Files: {file_list} + +FAILED TASKS (for escalation): +- Task #{id}: {failure_reason} + Tier 1 Retry: {attempted/skipped} → {outcome} + Tier 2 Retry: {attempted/skipped} → {outcome} + +CONTEXT UPDATES: +{Summary of new learnings from this wave} +``` + +#### Task Executor → Wave Lead (via SendMessage) + +``` +TASK RESULT +Task: #{id} +Status: PASS | PARTIAL | FAIL +Summary: {what was accomplished} +Files Modified: +- {path} (created|modified|deleted) +Verification: +- [PASS|FAIL] {criterion} +Issues: +- {issue description, if any} +``` + +#### Task Executor → Context Manager (via SendMessage) + +``` +CONTEXT CONTRIBUTION +Task: #{id} +Decisions: +- {key decision made during implementation} +Patterns: +- {pattern discovered or followed} +Insights: +- {useful information for other tasks} +Issues: +- {problems encountered, workarounds applied} +``` + +#### Context Manager → Task Executors (via SendMessage) + +``` +SESSION CONTEXT +Wave: {N} + +PROJECT SETUP: +{summarized tech stack, build commands, environment} + +CONVENTIONS: +{coding style, naming, import patterns discovered in prior waves} + +KEY DECISIONS: +{architecture choices from prior waves} + +KNOWN ISSUES: +{problems encountered, workarounds to be aware of} +``` + +#### Context Manager → Wave Lead (on Tier 2 enrichment request) + +``` +ENRICHED CONTEXT +Task: #{id} +Original Failure: {failure reason from Tier 1} + +ADDITIONAL CONTEXT: +{Detailed project context relevant to this task's failure} +{Related task results if available} +{Conventions or patterns that may help} +``` + +### 7.5 Session Directory Layout + +``` +.claude/sessions/__live_session__/ +├── execution_context.md # Cross-wave learning (grouped by wave) +├── task_log.md # Per-task status table +├── execution_plan.md # Wave breakdown (written in Step 4) +├── progress.jsonl # Structured progress events (JSON Lines) +└── session_summary.md # Final report (written in Step 6) + +.claude/sessions/{session-id}/ # Archived completed sessions +├── execution_context.md +├── task_log.md +├── execution_plan.md +├── progress.jsonl +└── session_summary.md + +.claude/sessions/interrupted-{timestamp}/ # Archived interrupted sessions +├── (same files as above) +``` + +#### execution_context.md Format + +```markdown +# Execution Context + +## Wave 1 +**Completed**: 2026-02-23T14:30:22Z +**Tasks**: #1 (PASS), #2 (PASS), #3 (FAIL) + +### Learnings +- Runtime: Node.js 22 with pnpm +- Tests: `__tests__/{name}.test.ts` alongside source +- Imports: Named exports, barrel files for public API + +### Key Decisions +- [Task #1] Used Zod for runtime validation over io-ts +- [Task #2] Placed shared types in `src/types/` directory + +### Issues +- Vitest mock.calls behavior differs from Jest — reset between tests + +--- + +## Wave 2 +**Completed**: 2026-02-23T14:45:10Z +**Tasks**: #4 (PASS), #5 (PASS) + +### Learnings +- API routes follow `src/api/{resource}/route.ts` pattern + +### Key Decisions +- [Task #4] Used middleware pattern for auth validation + +### Issues +- None +``` + +#### task_log.md Format + +```markdown +# Task Log + +| Task | Subject | Wave | Status | Attempts | Duration | +|------|---------|------|--------|----------|----------| +| #1 | Create data models | 1 | PASS | 1 | 2m 10s | +| #2 | Implement API handler | 1 | PASS | 1 | 3m 01s | +| #3 | Add validation | 1 | FAIL | 3 | 4m 12s | +| #4 | Create auth middleware | 2 | PASS | 1 | 2m 45s | +``` + +#### execution_plan.md Format + +```markdown +# Execution Plan + +**Task Group**: auth-feature +**Total Tasks**: 8 +**Total Waves**: 3 +**Max Parallel**: 5 +**Generated**: 2026-02-23T14:28:00Z + +## Wave 1 (4 tasks) +| Task | Subject | Priority | Complexity | +|------|---------|----------|------------| +| #1 | Create data models | critical | M | +| #2 | Set up config | high | S | +| #3 | Create interfaces | high | S | +| #4 | Add shared types | medium | XS | + +## Wave 2 (3 tasks) +| Task | Subject | Priority | Complexity | Blocked By | +|------|---------|----------|------------|------------| +| #5 | Implement API handler | critical | L | #1, #3 | +| #6 | Create service layer | high | M | #1 | +| #7 | Add middleware | medium | M | #3 | + +## Wave 3 (1 task) +| Task | Subject | Priority | Complexity | Blocked By | +|------|---------|----------|------------|------------| +| #8 | Integration tests | high | L | #5, #6, #7 | +``` + +#### progress.jsonl Format + +```jsonl +{"ts":"2026-02-23T14:28:00Z","event":"session_start","task_group":"auth-feature","total_tasks":8,"total_waves":3} +{"ts":"2026-02-23T14:28:30Z","event":"wave_start","wave":1,"task_count":4} +{"ts":"2026-02-23T14:35:22Z","event":"task_complete","wave":1,"task_id":"1","status":"PASS","duration_s":130} +{"ts":"2026-02-23T14:36:10Z","event":"task_complete","wave":1,"task_id":"2","status":"PASS","duration_s":181} +{"ts":"2026-02-23T14:38:45Z","event":"wave_complete","wave":1,"tasks_passed":3,"tasks_failed":1,"duration_s":615} +{"ts":"2026-02-23T14:50:00Z","event":"session_complete","total_passed":7,"total_failed":1,"total_duration_s":1320} +``` + +### 7.6 Orchestration Loop Detail + +#### Step 1: Load & Validate + +``` +Input: TaskList + CLI args (--task-group, --phase) +Output: Filtered, validated task set +Exit: If no tasks match filters, all completed, or no unblocked tasks + +Procedure: +1a. Read TaskList +1b. Apply filters: + - --task-group → match metadata.task_group + - --phase → match metadata.spec_phase (comma-separated integers) + - Tasks without spec_phase excluded when --phase is active +1c. Validate: + - Empty task list → suggest /create-tasks + - All completed → report summary + - No unblocked tasks → report blocking chains + - Circular dependencies → detect, break at weakest link (fewest blockers), warn user +``` + +#### Step 2: Configure & Plan + +``` +Input: Filtered task set, .claude/agent-alchemy.local.md +Output: Execution plan with wave assignments + +Procedure: +2a. Read settings (defaults if file missing): + - max_parallel (default: 5) + - max_retries (default: 1) + - wave_lead_model (default: opus) + - context_manager_model (default: sonnet) + - executor_model (default: opus) +2b. Topological wave assignment: + - Wave 1: tasks with no blockedBy (or all blockers completed) + - Wave N: tasks whose ALL blockedBy are in waves 1..N-1 or already completed +2c. Within-wave priority sort: + 1. critical > high > medium > low > unprioritized + 2. Ties: "unblocks most others" first +``` + +#### Step 3: Confirm + +``` +Input: Execution plan +Output: User confirmation or --dry-run exit + +Display via AskUserQuestion: +- Total task count, wave count +- Per-wave breakdown with task subjects and priorities +- Agent model tiers +- Estimated team composition per wave (1 wave-lead + 1 context-mgr + N executors) + +If --dry-run: display plan details and exit (no TaskUpdate, no session dir) +``` + +#### Step 4: Initialize Session + +``` +Input: Task group, timestamp +Output: Session directory with initial files + +Procedure: +4a. Generate session ID: {task-group}-{YYYYMMDD}-{HHMMSS} +4b. Check for existing __live_session__/ content: + - If found: offer resume or fresh start via AskUserQuestion + - Resume: reset in_progress tasks to pending, continue + - Fresh start: archive to .claude/sessions/interrupted-{timestamp}/ +4c. Create __live_session__/ with: + - execution_context.md (empty template with "# Execution Context" header) + - task_log.md (header row only) + - execution_plan.md (populated from Step 2 plan) + - progress.jsonl (session_start event) +``` + +#### Step 5: Execute Waves + +``` +For each wave: + 5a. Identify unblocked tasks (refresh via TaskList) + 5b. Write wave_start event to progress.jsonl + 5c. Create wave team via TeamCreate + 5d. Spawn wave-lead agent (foreground Task) with: + - Wave assignment (task list, max_parallel, max_retries, wave number) + - Cross-wave context (summary of execution_context.md) + 5e. Wait for wave-lead summary via SendMessage + 5f. Process wave summary: + - Update task_log.md with results + - Write wave_complete event to progress.jsonl + - Handle failed tasks requiring user escalation (Tier 3) + - For "Provide guidance": relay guidance to wave-lead for guided retry + - For "Fix manually": wait for user confirmation + - For "Skip": mark task as skipped + - For "Abort": terminate session + 5g. Delete wave team via TeamDelete + 5h. Repeat until no more unblocked tasks +``` + +#### Step 6: Summarize & Archive + +``` +Input: task_log.md, execution_context.md +Output: session_summary.md, archived session + +Summary includes: +- Total pass/fail/partial/skipped counts +- Total execution time +- Per-wave breakdown +- Failed task list with reasons +- Key decisions made during execution + +Write session_complete event to progress.jsonl +Archive: Move __live_session__/ contents to .claude/sessions/{session-id}/ +``` + +#### Step 7: Finalize + +``` +Input: execution_context.md +Output: CLAUDE.md edits (if warranted) + +Only update if meaningful project-wide changes occurred: +- New dependencies added +- New patterns established +- Architecture decisions made +- New commands or build steps discovered +``` + +### 7.7 Task Format Compatibility + +The new engine consumes tasks produced by `/create-tasks` without any format changes. Key fields: + +| Field | Location | Used For | +|-------|----------|----------| +| `subject` | Task top-level | Display in plan and logs | +| `description` | Task top-level | Executor receives as implementation instructions | +| `blockedBy` | Task relationship | Dependency graph for wave assignment | +| `metadata.task_group` | Task metadata | `--task-group` filtering, session ID generation | +| `metadata.spec_phase` | Task metadata | `--phase` filtering | +| `metadata.spec_phase_name` | Task metadata | Display in execution plan | +| `metadata.priority` | Task metadata | Within-wave priority sorting | +| `metadata.complexity` | Task metadata | Per-task timeout determination | +| `metadata.spec_path` | Task metadata | Task classification (spec-generated vs. general) | +| `metadata.feature_name` | Task metadata | Display and grouping | +| `metadata.task_uid` | Task metadata | Ignored (used by create-tasks merge mode only) | +| `produces_for` | Task top-level | Silently ignored — Context Manager handles info flow | + +### 7.8 Codebase Context + +#### Existing Architecture + +The new `run-tasks` skill lives at `claude/sdd-tools/skills/run-tasks/` alongside the existing `execute-tasks` skill. During development, both coexist. After the new skill is proven, `execute-tasks` will be deleted. + +#### Integration Points + +| File/Module | Purpose | How This Feature Connects | +|------------|---------|---------------------------| +| `create-tasks` skill | Produces task JSON | `run-tasks` consumes tasks via `TaskList`/`TaskGet` — format compatibility maintained | +| `.claude/agent-alchemy.local.md` | User settings | New settings under `run-tasks.*` namespace | +| `.claude/sessions/` directory | Session storage | Shared location for session artifacts | +| `execute-tdd-tasks` (tdd-tools) | TDD execution | No direct dependency — TDD adaptation is a follow-up | + +#### Patterns to Follow + +- **Skill/Reference split**: SKILL.md for high-level steps + `references/orchestration.md` for detailed procedures — used in current `execute-tasks` +- **AskUserQuestion for all user interaction**: All prompts routed through AskUserQuestion, never plain text — used across all SDD skills +- **YAML frontmatter for skill metadata**: Standardized frontmatter with `name`, `description`, `argument-hint`, `allowed-tools` — used by all skills +- **`${CLAUDE_PLUGIN_ROOT}` for path references**: Same-plugin references use `${CLAUDE_PLUGIN_ROOT}/`, cross-plugin use `/../{dir-name}/` — standard convention + +#### Key Dependencies + +- **Claude Code Team APIs**: `TeamCreate`, `SendMessage`, `TaskOutput`, `TaskStop` — these are the foundation. Any undocumented limitations could impact the design. +- **`create-tasks` task format**: The task JSON structure with `blockedBy`, `metadata.*` fields is an immutable input contract. +- **Session directory convention**: `.claude/sessions/__live_session__/` is the standard location. + +### 7.9 Technical Constraints + +| Constraint | Impact | Mitigation | +|------------|--------|------------| +| Claude Code API rate limits | Rapid agent spawning may be throttled | Staggered spawning with exponential backoff in wave-lead | +| TeamCreate is relatively new | Potential undocumented limitations | Graceful fallback patterns; test extensively in Phase 1 | +| SendMessage delivery is async | Small delays between agent sends; mid-wave relay unlikely | Wave-lead uses patient collection pattern; per-task timeouts catch stuck cases | +| Agent context window limits | Large tasks may exceed context | Context Manager provides concise summaries; task descriptions should be bounded | +| Max concurrent agents | Platform may limit total active agents | Wave-lead respects max_parallel hint; orchestrator runs waves sequentially | + +## 8. Scope Definition + +### 8.1 In Scope + +- New independent skill `run-tasks` at `claude/sdd-tools/skills/run-tasks/` +- New agent definitions: `wave-lead.md`, `context-manager.md`, `task-executor.md` +- 7-step orchestration loop with reference documentation +- 3-tier retry model (immediate → context-enriched → user escalation) +- Wave-lead crash recovery (auto-recover once) +- Per-task complexity-based timeouts +- Rate limit protection (staggered spawning with backoff) +- Session management (5 files, interrupted session recovery) +- Configuration system (5 settings) +- Auto-approve hook (simplified) +- Dry-run mode (`--dry-run`) +- Phase filtering (`--phase`) and task group filtering (`--task-group`) +- Communication protocol schemas in reference file +- Verification patterns reference (copied from existing engine) +- Progress events (`progress.jsonl`) +- Backwards compatibility with task JSON format from `/create-tasks` + +### 8.2 Out of Scope + +- **Changes to `/create-spec` or `/create-tasks`**: These skills are untouched +- **Changes to task JSON format**: Tasks use existing structure +- **TDD task routing**: `execute-tdd-tasks` adaptation is a separate follow-up spec +- **Task Manager dashboard compatibility**: Dashboard update is a separate follow-up +- **`produces_for` upstream injection**: Dropped — Context Manager handles information flow +- **File conflict detection**: Dropped — wave-lead coordinates via messages, executors work independently +- **Concurrent session support**: Still single-session per project +- **Per-task streaming progress**: Wave-level events only +- **Modifying or deleting the old `execute-tasks` skill**: It remains untouched during development + +### 8.3 Future Considerations + +- **TDD engine adaptation**: Update `execute-tdd-tasks` to use the new team model (wave-lead accepts `task_type` routing metadata) +- **Task Manager dashboard integration**: Update dashboard to consume `progress.jsonl` events and the new session layout +- **Cross-wave-lead communication**: For very large specs, wave-leads could share learnings directly +- **Adaptive model tiering**: Automatically downgrade executor model for simple tasks based on complexity +- **Persistent context manager**: A single context manager across waves, maintaining state without file I/O +- **Parallel wave execution**: Run independent wave branches concurrently +- **Deletion of old `execute-tasks`**: Remove after new engine is proven stable across 10+ multi-wave sessions + +## 9. Implementation Plan + +### 9.1 Phase 1: Core Engine + +**Completion Criteria**: The engine can load tasks, build a plan, present it to the user, create a session, execute single and multi-wave sessions via team-based coordination, and produce a session summary. Dry-run mode works end-to-end. + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Orchestration skill | New `skills/run-tasks/SKILL.md` with 7-step loop summary | Write skill entry point with argument parsing, step summaries | None | +| Orchestration reference | New `skills/run-tasks/references/orchestration.md` | Document all 7 step procedures in detail | SKILL.md structure | +| Wave-lead agent | New `agents/wave-lead.md` | Define agent prompt, model, tools, wave management behavior | None | +| Task executor agent | New `agents/task-executor.md` | Revised 4-phase workflow with SendMessage result protocol | None | +| Communication protocols | New `references/communication-protocols.md` | Define all message schemas between agent pairs | Agent definitions | +| Verification patterns | Copy `references/verification-patterns.md` | Copy from existing execute-tasks, adapt if needed | None | +| Session management | Init, recovery detection, archival | Create/archive session dirs, interrupted session handling | SKILL.md | +| Dry-run mode | `--dry-run` flag implementation | Complete Steps 1-3 only, display plan, exit | Orchestration skill | +| Wave dispatch | Orchestrator Step 5 implementation | TeamCreate per wave, wave-lead prompt construction, summary reception | All agents | +| Task state management | Wave-lead TaskUpdate integration | Wave-lead marks tasks in_progress/completed/failed | Wave-lead agent | + +**Checkpoint Gate**: +- [ ] `--dry-run` works end-to-end (load → filter → plan → display → exit) +- [ ] Single-wave execution works (spawn team → executors implement → results collected → summary reported) +- [ ] Multi-wave execution works (sequential waves with dependency ordering) +- [ ] Session directory is created with correct structure +- [ ] Interrupted session is detected and user is prompted +- [ ] task_log.md populated with per-task results + +--- + +### 9.2 Phase 2: Intelligence & Resilience + +**Completion Criteria**: Context is distributed to executors, collected, and persisted. The engine handles failures gracefully with 3-tier retry, wave-lead crash recovery, and per-task timeouts. + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Context manager agent | New `agents/context-manager.md` | Define agent prompt, model, tools | Phase 1 | +| Context distribution | Context manager → executor flow | Read execution_context.md, summarize, distribute via SendMessage | Context manager agent | +| Context collection | Executor → context manager flow | Receive contributions during wave, aggregate | Context manager agent | +| Context persistence | Write to execution_context.md | Wave-grouped format, append new wave section | Context distribution | +| Cross-wave bridge | Orchestrator passes context to wave-leads | Include execution_context.md summary in wave-lead prompt | Phase 1 + context persistence | +| 3-tier retry | Immediate + context-enriched + user escalation | Wave-lead Tier 1/2 retry logic, orchestrator Tier 3 escalation flow | Phase 1 + context manager | +| Wave-lead crash recovery | Automatic detection and retry | TaskOutput monitoring, task reset, new team spawn | Phase 1 | +| Per-task timeouts | Complexity-based timeout management | Wave-lead tracks executor duration, terminates via TaskStop on timeout | Phase 1 | +| Rate limit handling | Staggered spawning with backoff | Wave-lead implements spawn delays, retry on rate limit errors | Phase 1 | + +**Checkpoint Gate**: +- [ ] Context manager distributes session summary to executors before they begin work +- [ ] Executors send context contributions to context manager during execution +- [ ] `execution_context.md` is updated with wave-grouped learnings after each wave +- [ ] Later waves receive context from earlier waves via context manager +- [ ] Tier 1 retry (immediate) works for failed executors +- [ ] Tier 2 retry (context-enriched) works with additional context from Context Manager +- [ ] Tier 3 escalation presents 4 options to user +- [ ] Simulated wave-lead crash triggers automatic recovery +- [ ] Executor exceeding timeout is terminated and retried +- [ ] Rate limit during spawning triggers backoff (not crash) + +--- + +### 9.3 Phase 3: Polish & Integration + +**Completion Criteria**: Hooks work, progress events are emitted, configuration is fully functional, and documentation is complete. + +| Deliverable | Description | Technical Tasks | Dependencies | +|-------------|-------------|-----------------|--------------| +| Auto-approve hook | Simplified session write approval | Write `hooks/auto-approve-session.sh` for session directory writes | Phase 1 | +| Hook configuration | New `hooks/hooks.json` | Configure PreToolUse auto-approve hook | Auto-approve hook | +| Progress events | Progress.jsonl writing | Orchestrator writes session/wave events to progress.jsonl | Phase 1 | +| Configuration system | Full 5-setting support | Parse YAML frontmatter for all settings, apply to agent model selection | Phase 1 | +| execution_plan.md | Persist wave plan | Write plan to session directory in Step 4 | Phase 1 | +| Documentation | Updated CLAUDE.md entries | Document new architecture, agents, configuration, session layout | All phases | +| Migration guide | Transition documentation | Document relationship to old execute-tasks, what changed, new CLI | All phases | + +**Checkpoint Gate**: +- [ ] Auto-approve hook allows autonomous session writes +- [ ] Progress events are written to progress.jsonl for session/wave lifecycle +- [ ] All 5 configuration settings are read and applied +- [ ] execution_plan.md is populated in session directory +- [ ] CLAUDE.md reflects the new architecture +- [ ] Migration guide documents the transition from execute-tasks to run-tasks + +## 10. Testing Strategy + +### 10.1 Test Approach + +Given that this is a Claude Code plugin (markdown-as-code), traditional unit testing doesn't apply. Testing focuses on scenario-based verification and dry-run validation. + +| Level | Scope | Method | Coverage Target | +|-------|-------|--------|-----------------| +| Agent scenarios | Individual agent behavior | Execute agents in isolation with controlled inputs | All 3 agent types | +| Integration | Full wave lifecycle | Execute single-wave sessions with known task sets | Happy path + all failure modes | +| Regression | Multi-wave sessions | Execute multi-wave specs end-to-end | 5+ session runs without failure | +| Dry-run | Plan validation | `--dry-run` flag verifies plan without execution | All filter combinations | + +### 10.2 Test Scenarios + +#### Scenario: Happy Path (Single Wave) + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create 3 tasks with no dependencies | Tasks created | +| 2 | Run `/run-tasks` | Plan shows 1 wave with 3 tasks | +| 3 | Confirm execution | Wave team spawned, context distributed | +| 4 | Wait for completion | All 3 tasks pass, session summary generated | + +#### Scenario: Multi-Wave with Dependencies + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create 5 tasks: A, B→A, C, D→B+C, E→D | Tasks created with dependency chain | +| 2 | Run `/run-tasks` | Plan shows 3 waves: [A, C], [B], [D], [E] | +| 3 | Confirm and execute | Waves execute sequentially, context flows between waves | + +#### Scenario: 3-Tier Retry Escalation + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create task likely to fail | Task created | +| 2 | Execute | Executor fails | +| 3 | Tier 1 | Wave-lead retries immediately with failure context | +| 4 | Tier 2 | Wave-lead requests enriched context from Context Manager, retries | +| 5 | Tier 3 | Failure escalated to user with 4 options | + +#### Scenario: Wave-Lead Crash Recovery + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create tasks that trigger a wave | Tasks created | +| 2 | Simulate wave-lead crash (agent timeout) | Orchestrator detects crash | +| 3 | Observe recovery | Tasks reset to pending, new wave team spawned | +| 4 | Second crash | Escalated to user | + +#### Scenario: Per-Task Timeout + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create an XS task (5 min timeout) | Task created | +| 2 | Execute with a task that would exceed timeout | Executor is terminated via TaskStop | +| 3 | Observe retry | Timed-out task enters Tier 1 retry flow | + +#### Scenario: Phase Filtering + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Create tasks with spec_phase 1, 2, 3 | Tasks created | +| 2 | Run `/run-tasks --phase 1` | Only phase 1 tasks in plan | +| 3 | Execute | Phase 1 tasks execute, phases 2-3 remain pending | + +#### Scenario: Interrupted Session Recovery + +| Step | Action | Expected Result | +|------|--------|-----------------| +| 1 | Start execution, interrupt mid-wave | `__live_session__/` exists with partial state | +| 2 | Run `/run-tasks` again | Prompted to resume or start fresh | +| 3 | Resume | in_progress tasks reset to pending, execution continues | + +### 10.3 Dry-Run Validation + +The dry-run mode serves as a lightweight test harness: + +``` +/run-tasks --dry-run +/run-tasks --dry-run --phase 1 +/run-tasks --dry-run --task-group auth-feature +``` + +Each invocation validates the full plan generation pipeline (load, filter, validate, topological sort, wave assignment) without modifying state or spawning agents. + +## 11. Deployment & Operations + +### 11.1 Deployment Strategy + +This is a new skill addition to the sdd-tools plugin. The old `execute-tasks` remains during development. + +**Deployment steps**: +1. Create `skills/run-tasks/SKILL.md` with 7-step orchestration +2. Create `skills/run-tasks/references/orchestration.md` with step details +3. Create `skills/run-tasks/references/communication-protocols.md` with message schemas +4. Copy `skills/run-tasks/references/verification-patterns.md` from existing engine +5. Create `agents/wave-lead.md` +6. Create `agents/context-manager.md` +7. Create `agents/task-executor.md` (new version, alongside old one initially) +8. Create `hooks/` directory with simplified auto-approve hook and hooks.json +9. Update marketplace.json version for sdd-tools +10. Test extensively before removing old execute-tasks + +**Rollback plan**: Delete the `run-tasks` skill directory and agent files. The old `execute-tasks` remains functional throughout. + +### 11.2 Hook Configuration + +```json +{ + "hooks": { + "PreToolUse": [{ + "matcher": "Write|Edit", + "hooks": [{ + "type": "command", + "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/auto-approve-session.sh", + "timeout": 5 + }] + }] + } +} +``` + +### 11.3 File Inventory + +| File | Purpose | +|------|---------| +| `skills/run-tasks/SKILL.md` | Skill entry point with 7-step summary | +| `skills/run-tasks/references/orchestration.md` | Detailed step procedures | +| `skills/run-tasks/references/communication-protocols.md` | Inter-agent message schemas | +| `skills/run-tasks/references/verification-patterns.md` | Task verification logic (copied) | +| `agents/wave-lead.md` | Wave coordination agent | +| `agents/context-manager.md` | Context distribution/collection agent | +| `agents/task-executor.md` | Code implementation agent (revised) | +| `hooks/hooks.json` | Hook configuration | +| `hooks/auto-approve-session.sh` | Session write auto-approval | + +## 12. Dependencies + +### 12.1 Technical Dependencies + +| Dependency | Status | Risk if Unavailable | +|------------|--------|---------------------| +| Claude Code `TeamCreate` | Available | Critical — core architecture depends on this | +| Claude Code `SendMessage` | Available | Critical — all agent coordination uses this | +| Claude Code `TaskList`/`TaskUpdate`/`TaskGet` | Available | Critical — task state management | +| Claude Code `TaskOutput`/`TaskStop` | Available | High — crash detection and timeout enforcement | +| `.claude/agent-alchemy.local.md` | Optional | Low — defaults used if missing | + +### 12.2 Cross-Plugin Dependencies + +| Plugin | Dependency | Impact | +|--------|------------|--------| +| `create-tasks` (sdd-tools) | Task JSON format compatibility | Tasks must have same `blockedBy`, `metadata.task_group`, `metadata.spec_phase` structure | +| `tdd-tools` | No direct dependency | TDD adaptation is a separate follow-up; the new engine does not reference tdd-tools | + +## 13. Risks & Mitigations + +| Risk | Impact | Likelihood | Mitigation Strategy | +|------|--------|------------|---------------------| +| TeamCreate API instability | High | Low | Test extensively in Phase 1; implement retry on team creation failure | +| SendMessage delivery delays | Medium | Medium | Wave-lead uses patient collection pattern; per-task timeouts catch stuck cases | +| Higher API cost from 3-tier agents | Medium | High | All agent models configurable; context manager defaults to Sonnet (cheaper) | +| Context Manager produces poor summaries | Medium | Medium | Context manager uses Sonnet (strong summarization); orchestrator also bridges context directly in wave-lead prompt as backup | +| Wave-lead agent prompt too complex | Medium | Medium | Keep wave-lead instructions focused; externalize complex logic into orchestration reference | +| Rate limit issues with parallel agent spawning | Medium | High | Staggered spawning with exponential backoff built into wave-lead | +| Max concurrent agent limit unknown | Medium | Medium | Test during Phase 1; wave-lead respects max_parallel hint | +| Old execute-tasks and new run-tasks agent name collision | Low | Medium | New agents use distinct names (wave-lead, context-manager vs. task-executor); old task-executor remains functional for execute-tdd-tasks | + +## 14. Open Questions + +| # | Question | Owner | Resolution | +|---|----------|-------|------------| +| 1 | Does `bypassPermissions` cover session directory writes for the Context Manager agent? | Implementation | Test during Phase 3 — if yes, auto-approve hook is unnecessary | +| 2 | What is the maximum number of concurrent agents Claude Code supports in a single team? | Implementation | Test during Phase 1 — may affect max_parallel recommendations | +| 3 | Should the new `task-executor.md` agent coexist with the old one during development, or use a different name? | Implementation | Resolve before Phase 1 — naming collision risk with execute-tdd-tasks | + +## 15. Appendix + +### 15.1 Glossary + +| Term | Definition | +|------|------------| +| Wave | A group of tasks that can execute in parallel (same topological sort level) | +| Wave Lead | The team-lead agent responsible for managing all executors within a single wave | +| Context Manager | A specialized team-member agent responsible for distributing and collecting execution context within a wave | +| Task Executor | A team-member agent that implements a single task using a 4-phase workflow | +| Orchestrator | The skill running in the user's conversation context that coordinates waves sequentially | +| Structured Protocol | The defined message format for inter-agent communication via SendMessage | +| Session | A single execution run covering one or more waves, producing session artifacts | +| Escalation | The process of reporting a persistent failure to the user for manual resolution | +| Tier 1 Retry | Immediate retry by wave-lead with failure context from original attempt | +| Tier 2 Retry | Retry with enriched context from Context Manager (related results, detailed project context) | +| Tier 3 Escalation | User-facing prompt with 4 resolution options (Fix/Skip/Guide/Abort) | + +### 15.2 References + +- Current orchestration engine: `claude/sdd-tools/skills/execute-tasks/` +- Current task-executor agent: `claude/sdd-tools/agents/task-executor.md` +- Orchestration deep-dive: `internal/docs/sdd-orchestration-deep-dive-2026-02-22.md` +- Original rewrite spec (v1): `internal/specs/sdd-execute-tasks-rewrite-SPEC.md` +- Claude Code Agent Team tools: TeamCreate, SendMessage, TaskOutput, TaskStop + +### 15.3 Change Log + +| Version | Date | Author | Changes | +|---------|------|--------|---------| +| 1.0 | 2026-02-23 | Stephen Sequenzia | Initial version — rebuilt from scratch via adaptive interview | + +--- + +*Document generated by SDD Tools*