DDD Session Evaluation

Do not change any code, specs, commands, or files. This is a read-only analysis task.

Reflect on your usage of DDD commands, the DDD Usage Guide, and the claude-commands ecosystem in THIS session. Produce a structured report using real examples — not hypothetical ones.

Accuracy rules:

Only report what actually happened this session. Do not invent findings to fill sections.
If a section has no relevant observations, write "N/A — not applicable this session." Empty sections are better than padded ones.
Token counts and overhead percentages are estimates — label them as such (use "~" prefix). Do not fabricate precision.
For the non-DDD command table: only fill rows where you genuinely used the command or genuinely wished you had it during a specific moment. Leave other rows blank.
Speculative questions (marked with "↳") are clearly opinion — answer honestly, including "I don't know" if you don't have a basis for the answer.

This evaluation produces two outputs:

docs/ddd-session-eval-{YYYY-MM-DD}.md — full human-readable report (all findings)
docs/ddd-session-shortfalls-{YYYY-MM-DD}.yaml — machine-readable subset for /ddd-evolve (only qualifying spec/framework gaps)

Do not modify any other files.

Before You Begin

Fetch the /ddd-create command to learn the shortfalls.yaml format:

gh api repos/cybersoloss/claude-commands/contents/ddd-create.md --jq '.content' | base64 -d

Find Step 16 (--shortfalls section) which defines the shortfalls.yaml YAML structure. This is the format you must use for the YAML output. Understand the sections (missing_node_types, inadequate_existing_nodes, missing_spec_fields, connection_limitations, layer_gaps, workarounds, cross_cutting_gaps, ui_shortfalls, pillar_balance, summary) and the content rules (shortfalls are DDD framework limitations, NOT project scope decisions or spec quality issues).

Also check for prior eval reports (docs/ddd-session-eval-*.md) and shortfalls files (docs/ddd-session-shortfalls-*.yaml) in the project. Note their finding IDs for cross-referencing in Section 12.

Finding IDs

Every finding in Sections 10a–10d and Section 11 must have a unique ID. Format: {project-short}-{YYYYMMDD}-{seq} where {project-short} is a short project identifier (e.g., cc for content-curator, ddt for ddd-tool) and {seq} is a zero-padded sequence number starting at 001.

Examples: cc-20260313-001, cc-20260313-002, ddt-20260315-001

Use these IDs in:

The markdown report (prefix each finding row/bullet with its ID)
The YAML shortfalls (add an eval_id field to each entry that qualifies)
Section 12 (reference prior findings by ID)

1. Session Profile

Project: (name and brief description)
Date:
Session type: (new feature / bug fix / refactor / investigation / mixed)
Pillars touched: (Logic / Data / Interface / Infrastructure)
DDD commands invoked (list each with count):
Non-DDD commands invoked (e.g., /code-review, /security-scan):
Total change-history entries in the project:

2. Spec-Driven Coding Impact

The core question: did reading specs before coding improve outcomes?

For each significant task this session:

Task	Read Spec First?	First-Try Correct?	Iterations	Root Cause of Iterations	Spec Would Have Helped?
example: Added retry logic to publish flow	Yes	Yes	0	—	N/A — spec showed all error paths
example: Fixed validation toast	No, went to code	No	1	Missing error branch	Yes — spec had the branch

Summary:

Tasks where reading specs improved correctness or prevented rework:
Tasks where skipping specs had no impact:
Tasks where specs would have helped but were skipped:
Edge cases or error paths that specs revealed but code alone wouldn't:

Methodology efficiency:

When was reading specs redundant or wasteful? (spec added nothing the code didn't already tell you)
When were specs wrong or stale, and code was the actual truth?
When did spec-driven workflow add overhead without improving the outcome?
↳ If you redid this session without DDD at all, what would you lose? What would be easier?

3. Error Prevention Analysis

List bugs, rework, or missed requirements from this session. For each:

Issue	Caught By	Could Spec Have Prevented It?	How
example: Missing null check on API response	Runtime error	Yes	Flow spec shows decision node for empty response
example: Wrong env var name	Test failure	No	Runtime config, not in spec

4. Usage Guide Reference Patterns

For each DDD command that references the Usage Guide:

Command Invocation	Fetched Guide?	Sections Actually Used	Could Have Skipped Fetch?	Reason
example: `/ddd-update` adding cache node	Yes	Section 6 (cache node spec)	No — first time using cache
example: `/ddd-update` adding process node	No	None	Yes — familiar node type

Total fetch events:
Necessary fetches:
Unnecessary fetches:
Estimated tokens wasted on unnecessary fetches: (Usage Guide is ~5,200 lines / ~18,000 tokens per full fetch)
If you fetched the guide at least once but skipped it for later commands, why? (e.g., internalized the relevant sections, node types were familiar, guide was too large for the value it provided, fetched section wasn't relevant to the task)
If you never fetched the guide this session, why not? (e.g., all node types familiar, no command required it, consciously skipped to save tokens)

5. Command Prompt Utilization

For each DDD command executed, estimate how much of the command prompt was useful vs overhead:

Command	Prompt Size (est. lines)	Sections Referenced	Sections Ignored	Est. Overhead %	What Was Overhead
example: `/ddd-update` single flow	~320	Scope resolution, change-history format	Infra, schema, UI integrity, add-domain	~85%	All non-Logic pillar instructions, Usage Guide fetch instructions, add-domain/add-page scope tables
example: `/ddd-update` question	~320	None	Entire prompt	~98%	User asked "why is X happening?" — entire spec-editing pipeline loaded for a question that needed zero command structure

Some sections influence behavior implicitly — note those in the "Sections Referenced" column even if you didn't actively consult them.

Session total: Estimated tokens of command prompt overhead across all invocations: ___

↳ What patterns do you see in the overhead? Look across all rows. Is the overhead concentrated in specific commands? Specific types of invocations (e.g., questions vs changes)? Specific unused pillar instructions? What structural change to commands or the Usage Guide would eliminate the most waste?

6. Spec Freshness Observations

Did you encounter specs that were out of sync with code?

Spec File	Drift Type	Impact	How Discovered
example: auth/login.yaml	Code ahead (new field added in code, not in spec)	Low — cosmetic	Noticed during `/ddd-sync`
example: schemas/user.yaml	Spec ahead (field defined but never implemented)	High — generated wrong test	Test failure

7. Workflow: Planned vs Actual

DDD lifecycle suggests:

e.g. /ddd-update → /ddd-implement → /ddd-test

What actually happened:

e.g. User asked about a bug → read code directly → fixed it →
     committed → later ran /ddd-update to sync spec with fix

Friction points:

Where did you break out of the DDD command flow? Why?
Where did the command pipeline feel natural?
Where did a command load context you didn't need?
↳ Was there a task that NO DDD command could help with? What would that command look like?
Did the user invoke a DDD command for something it wasn't designed for? What was the user's actual intent vs the command's purpose?

Human-AI collaboration through DDD:

How well did DDD serve as a collaboration layer between you and the user this session? Cite specific moments.

Did the user struggle to pick the right DDD command for their intent? What were they trying to do and what did they reach for?
Did the user make a design decision during conversation (verbally or by approving a fix) that should have been captured in specs but wasn't? What fell through the cracks?
Did DDD's pipeline (update spec → implement → test) add ceremony the user didn't need, or did it give useful structure they wouldn't have had otherwise?
Were there moments where you and the user were aligned on what to do, but DDD's structure got in the way? Or moments where DDD helped you stay aligned?
↳ What changes to DDD commands, Usage Guide, or workflow would improve how the user communicates intent and reviews your work? (new commands, new modes, guide sections, scope changes)

8. Node Type Usage

Used this session:
Needed reference for:
Knew from prior context:
Never used (of 30 available):
Used process node as a workaround for something a structured node type should handle? (If yes, describe what the process node does and what node type you wish existed — this feeds into the shortfalls YAML)

9. Cross-Command Observations

9a. Duplication Analysis

List specific content you actually noticed repeated across commands you executed this session. Only report duplication you encountered firsthand — do not read other command prompts just to find duplication:

Duplicated Content	Where It Appears	Times Loaded This Session	Could Be Shared?
example: "Read project context" file list	`/ddd-update`, `/ddd-implement`, `/ddd-test`	3	Yes — identical across all commands
example: Cross-cutting patterns explanation	`/ddd-update`, `/ddd-implement`	2	Yes — same ~30 lines

↳ What would you extract or deduplicate? If you could restructure the commands and Usage Guide to eliminate the repetition you observed, what would you change? (e.g., shared preamble file, reference-instead-of-reproduce, conditional loading)

9b. Non-DDD Command Integration

For each command below, note whether you used it or genuinely wished you had at a specific moment this session. Leave rows blank if you have no real observation — do not fill rows speculatively:

Non-DDD Command	Used?	Relationship to DDD	Observation
`/code-review`		Could verify code matches spec
`/security-scan`		Overlaps with architecture.yaml security patterns
`/test-coverage`		Overlaps with `/ddd-test`
`/spec-verify`		Could use mapping.yaml for verification
`/perf-review`		Independent
`/pre-deploy`		Could include `/ddd-status` drift check
`/simplify`		Could check if simplifications break spec compliance
`/trust-verify`		Could use spec-to-code mapping as trust evidence
(other)

↳ Integration architecture questions (only answer if grounded in observations above):

Which non-DDD commands should be DDD-aware (read specs, mapping, or architecture.yaml)?
Which DDD commands could delegate work to non-DDD commands instead of reimplementing?
Are there workflow combinations that should be a single command?

10. DDD Framework Feedback

Report problems, gaps, and inconsistencies you encountered in the DDD commands and Usage Guide during this session. Be specific — cite the command name, step number, or Usage Guide section.

10a. Command Problems

Issues encountered while executing DDD slash commands:

ID	Command	Step/Section	Problem Type	Description
cc-20260313-001	`/ddd-implement`	Step 11 (cross-cutting)	Unclear guidance	Says "apply matching patterns" but doesn't say what to do when two patterns conflict
cc-20260313-002	`/ddd-sync`	Step 6 (drift analysis)	Missing instruction	No guidance for handling a flow that exists in mapping.yaml but spec file was deleted

Problem types: unclear guidance, missing instruction, wrong instruction, ambiguous step, silent failure, missing error handling, missing scope support

10b. Usage Guide Problems

Issues with the DDD Usage Guide content:

ID	Section	Problem Type	Description
cc-20260313-003	Section 6, `smart_router` node	Incomplete spec	Lists fields but no example YAML — unclear how to set `routes`
cc-20260313-004	Section 8, connection patterns	Missing pattern	No example for connecting `parallel` node outputs to a `collection` node

Problem types: incomplete spec, wrong information, missing example, unclear explanation, outdated content, missing section

10c. Inconsistencies

Conflicting or mismatched content between commands, or between commands and the Usage Guide:

ID	Location A	Location B	Inconsistency
cc-20260313-005	`/ddd-implement` step 5	`/ddd-update` step 6	Different node ID format conventions (one says 8-char, other says 6-char)
cc-20260313-006	`/ddd-sync` next steps	Usage Guide Section 12.1	Different remediation order for `diverged` findings

10d. Missing Capabilities

Things you needed to express or do but DDD didn't support:

cc-20260313-007: "Needed to spec a WebSocket reconnection strategy but no node type or spec field supports it"
cc-20260313-008: "Wanted to mark a flow as deprecated in the spec but there's no status or lifecycle field"
cc-20260313-009: "Command doesn't support --dry-run — would have been useful to preview changes"

11. Recommendations

11a. Tactical Fixes

Specific fixes for issues found in Sections 10a–10d:

ID	What to Change	Category	Evidence From This Session	Estimated Impact	Priority
cc-20260313-010		quality / efficiency / workflow / integration / safety		e.g., "~5,000 tokens saved per invocation" or "prevents class of bugs"	high / medium / low

11b. Systemic Improvements

Step back from individual findings. Based on the overhead data (Section 5), duplication analysis (Section 9a), workflow friction (Section 7), and integration gaps (Section 9b), what structural changes to the DDD framework would have the highest impact? Only propose changes grounded in your actual observations — do not generate generic optimization ideas.

Think about:

How should the Usage Guide be structured for AI consumption? (monolithic file vs split sections, navigation aids, caching strategies)
How should command prompts be organized to reduce overhead? (shared preambles, conditional loading, fast paths for simple operations)
What recurring patterns across commands should be extracted or deduplicated?
What new commands or command modes would eliminate friction you observed?
How should DDD commands integrate with non-DDD commands in the ecosystem?

For each proposal:

ID	Structural Change	What It Eliminates	Estimated Token/Workflow Impact
cc-20260313-015	e.g., Split Usage Guide into fetchable sections by topic	~15,000 tokens of unnecessary content per fetch	60-80% reduction per Usage Guide reference
cc-20260313-016	e.g., Add fast-path mode to `/ddd-update` for single-flow changes	Pillar instructions, cross-domain analysis, scope tables for unused scopes	~1,500 tokens saved for simple changes

12. Prior Reports

If prior eval reports exist in the project's docs/ directory (ddd-session-eval-*.md or ddd-session-shortfalls-*.yaml), scan their finding IDs. Only report on prior findings whose scenario you actually re-encountered this session:

Confirmed: You hit the same issue again this session (reference by ID, e.g., "Confirms cc-20260310-003 — same inconsistency still present")
Fixed: You encountered the same scenario but the problem was gone (e.g., "cc-20260310-001 — I used /ddd-update for a question and it correctly detected non-spec intent")
Patterns: Recurring themes you observe across reports you've read

Do NOT guess whether prior findings are fixed if you didn't encounter their scenario this session. Silence is honest; guessing is not. Do NOT copy prior findings into this report. Reference by ID only.

If no prior reports exist, skip this section.

13. DDD Usage Statistics

Provide project-level usage context that makes findings credible across reports:

Change-history entries by pillar: Logic: ___ | Data: ___ | Interface: ___ | Infrastructure: ___
Most common DDD operations this session: (e.g., /ddd-update single flow 45%, /ddd-implement pending 25%)
Node types actively used in project: ___ of 30
Node types used this session: ___ of 30
Usage Guide fetches this session vs total command invocations: / (fetch rate: __%)

Output

Output 1: Human-readable report

Write the full report (all sections above) to docs/ddd-session-eval-{YYYY-MM-DD}.md.

Output 2: Machine-readable shortfalls for `/ddd-evolve`

Review your findings from Sections 8, 10b, and 10d. If ANY qualify as DDD framework gaps (not project decisions, not spec quality issues, not command/workflow problems), write them to docs/ddd-session-shortfalls-{YYYY-MM-DD}.yaml using the exact shortfalls.yaml format from /ddd-create Step 16 (fetched in the "Before You Begin" step).

Mapping guide — which eval findings go into the YAML:

Eval Finding	shortfalls.yaml Section	Condition
10d: Missing node type	`missing_node_types`	Needed a node type that doesn't exist
10d: Missing spec field	`missing_spec_fields`	Needed a field on a node/connection/flow that doesn't exist
10d: Connection limitation	`connection_limitations`	Couldn't express an edge behavior or routing pattern
10d: Missing UI capability	`ui_shortfalls.*`	Couldn't express a UI component, interaction, or form pattern
10b: Inadequate node spec	`inadequate_existing_nodes`	A node type exists but lacks needed capabilities (not just missing docs)
8: Used process node as workaround	`workarounds`	Used a generic node because no structured type fit
8: Node type usage stats	`summary.feature_coverage`	Used vs available counts

What does NOT go into the YAML:

Command instruction problems (10a) — these are command quality issues, not spec gaps
Inconsistencies between commands/guide (10c) — these are documentation issues
Workflow friction (Section 7) — these are process observations
Usage Guide documentation issues (10b, when the node spec is adequate but docs are unclear)
Recommendations (Section 11) — these are general suggestions

If no findings qualify for YAML, skip the YAML output entirely. An absent file is better than an empty one.

Use the shortfalls.yaml header fields:

project: "{project-name}"
generated: "{ISO timestamp}"
ddd_version: "1.0"
source: "ddd-session-eval"  # distinguishes from /ddd-create --shortfalls output

Add an eval_id field to each shortfall entry so it can be traced back to the markdown report:

missing_node_types:
  - eval_id: "cc-20260313-007"  # links to markdown finding
    name: "reconnection-strategy"
    severity: medium
    # ... rest of standard shortfalls.yaml fields

Comprehensive YAML requirement: If you produce a YAML, it must include ALL standard sections — not just findings from this session. Do a lightweight project-level audit:

pillar_balance — count the project's flows, schemas, UI pages, and infrastructure services by scanning specs/. Include pages_without_specs and imbalance_warnings. This is fast (directory listing + file count).
summary.feature_coverage — compute the full Feature Usage Matrix (node_types, trigger_types, collection_operations, crypto_operations, parse_formats, data_store_types, connection_behaviors, ui_component_types, form_field_types, schema_index_types, etc.) by scanning existing flow specs. Report used/available ratios. Include unused_but_applicable for features the project could benefit from.
ui_shortfalls — if UI specs exist (specs/ui/), scan them for missing component types, inadequate components, form limitations, and interaction gaps — even if UI wasn't touched this session. Use empty arrays for sub-sections with no findings.
layer_gaps — populate elements_used for all layers where specs exist. For missing_elements and invisible_information, only report issues you can identify from the specs.
Empty sections — include all sections with [] when no issues found. An explicit empty array signals "audited, no gaps" vs an omitted section which signals "not checked."

This ensures every session-eval YAML is complete enough for /ddd-evolve to produce meaningful analysis, even when the session itself was narrow in scope.

After writing

Tell the user the file path(s) and summarize:

How many findings went into the markdown report (tactical fixes in 11a + systemic proposals in 11b)
How many (if any) qualified as shortfalls for the YAML output
Highlight the top systemic proposal from 11b if one stands out
Suggested next step: if YAML was produced, suggest /ddd-evolve docs/ddd-session-shortfalls-{date}.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDD Session Evaluation

Before You Begin

Finding IDs

1. Session Profile

2. Spec-Driven Coding Impact

3. Error Prevention Analysis

4. Usage Guide Reference Patterns

5. Command Prompt Utilization

6. Spec Freshness Observations

7. Workflow: Planned vs Actual

8. Node Type Usage

9. Cross-Command Observations

9a. Duplication Analysis

9b. Non-DDD Command Integration

10. DDD Framework Feedback

10a. Command Problems

10b. Usage Guide Problems

10c. Inconsistencies

10d. Missing Capabilities

11. Recommendations

11a. Tactical Fixes

11b. Systemic Improvements

12. Prior Reports

13. DDD Usage Statistics

Output

Output 1: Human-readable report

Output 2: Machine-readable shortfalls for `/ddd-evolve`

After writing

FilesExpand file tree

ddd-session-eval.md

Latest commit

History

ddd-session-eval.md

File metadata and controls

DDD Session Evaluation

Before You Begin

Finding IDs

1. Session Profile

2. Spec-Driven Coding Impact

3. Error Prevention Analysis

4. Usage Guide Reference Patterns

5. Command Prompt Utilization

6. Spec Freshness Observations

7. Workflow: Planned vs Actual

8. Node Type Usage

9. Cross-Command Observations

9a. Duplication Analysis

9b. Non-DDD Command Integration

10. DDD Framework Feedback

10a. Command Problems

10b. Usage Guide Problems

10c. Inconsistencies

10d. Missing Capabilities

11. Recommendations

11a. Tactical Fixes

11b. Systemic Improvements

12. Prior Reports

13. DDD Usage Statistics

Output

Output 1: Human-readable report

Output 2: Machine-readable shortfalls for /ddd-evolve

After writing

Output 2: Machine-readable shortfalls for `/ddd-evolve`