Skip to content

TST-21: Expand demo director reporting, assertions, presets, and soak#500

Merged
Chris0Jeky merged 7 commits intomainfrom
test/demo-director-reporting-and-soak
Mar 29, 2026
Merged

TST-21: Expand demo director reporting, assertions, presets, and soak#500
Chris0Jeky merged 7 commits intomainfrom
test/demo-director-reporting-and-soak

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

Closes #331

  • HTML demo report generator (demo-report-html.mjs): Generates self-contained static HTML reports from demo artifact bundles. Reports include inline styles, step-by-step trace tables with pass/fail badges, and embedded base64 screenshots. No external dependencies.
  • Snapshot & trace assertions (demo-trace-assertions.mjs): Assertion utilities supporting exact match, structural/shape match, step ordering validation, error detection, and required event checks. Integrates with vitest for test-time validation.
  • Director narrative presets (demo-director-presets.mjs): Four built-in presets (happy-path-capture, review-approve-flow, error-recovery-demo, soak-baseline) with scenario config, timing, and trace expectations. Loadable by name with override merging.
  • Soak mode (demo-soak.mjs): Long-run loop runner with configurable iteration count and duration limit. Tracks cumulative metrics: pass rate, timing drift, min/max/avg iteration time, and memory indicators.
  • 63 new tests covering all four modules plus an integration test that runs a preset through the full assertion and reporting pipeline.

Test plan

  • All 63 new tests pass (npx vitest --run)
  • Full test suite (1102 tests across 120 files) passes with no regressions
  • npm run build succeeds (typecheck + vite build)
  • Verify presets load correctly when invoked from demo-director CLI
  • Verify HTML report renders correctly in a browser from real artifact bundles

Creates demo-report-html.mjs with utilities to generate self-contained
static HTML reports from demo director artifacts (run-summary.json,
trace.ndjson, screenshots). Reports include inline styles, step-by-step
trace table, pass/fail badges, and embedded screenshot images.
Creates demo-trace-assertions.mjs with exact match, structural match,
step ordering, error detection, and required events assertions. Supports
both strict comparison and shape-only validation modes for comparing
demo traces against known-good snapshots.
Creates demo-director-presets.mjs with four built-in presets:
happy-path-capture, review-approve-flow, error-recovery-demo, and
soak-baseline. Each preset defines scenario, director args, and trace
expectations. Supports loading by name, merging with overrides, and
runtime registration of custom presets.
Creates demo-soak.mjs with configurable iteration count, duration limit,
and cooldown between runs. Tracks cumulative metrics including pass rate,
timing drift, and memory indicators. Builds a structured summary report
at completion.
63 tests covering: HTML report generation (escaping, status badges,
screenshots, XSS prevention), trace assertions (exact/structural match,
ordering, error detection), preset loading/merging/registration,
soak config validation and loop execution, and a full integration test
that runs a preset through the assertion and reporting pipeline.
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

Findings

  1. registerPreset mutates the shared PRESETS object -- the preset registered in demo-director-presets.spec.ts (test-custom-preset) persists across test files since vitest may share module state within the same worker. This is low-risk because the test key is unique and not a built-in preset name, but if vitest parallelism changes or another test checks listPresetIds().length, it could be fragile. Acceptable for now since the test name is sufficiently unique.

  2. runSoak maxDurationMs check is racy by nature -- the shouldContinueSoak check happens at the top of the loop before runFn executes, so total wall time will always exceed maxDurationMs by at least one iteration's duration. This is documented/expected behavior for soak loops (graceful stop, not hard kill). The test accounts for this.

  3. generateReportFromArtifacts (the file-reading entrypoint) is not unit-tested -- only the pure generateHtmlReport function is. The file-reading wrapper requires real filesystem artifacts, which is more of an integration/E2E concern. The pure logic is well-covered.

  4. collectMemoryIndicators in soak summary -- the function reads process.memoryUsage() at summary-build time (end of soak), not per-iteration. This means it captures final memory state rather than peak or per-iteration trends. For soak drift detection, per-iteration memory snapshots would be more informative. This is a reasonable v1 -- adding per-iteration snapshots could be a follow-up if memory leak detection becomes a priority.

  5. No --preset CLI flag wired into demo-director-args.mjs -- the presets module is a loadable library, but the actual demo-director CLI does not yet accept --preset <name> as an argument. This is intentional scope control -- wiring it in is a small follow-up that touches existing code, and the presets are usable programmatically right now.

Verdict

All findings are known trade-offs or intentional scope boundaries. No blocking issues found.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Code Review of PR #500

I read every line of the diff across all 9 new files (4 source modules, 5 test files). Here are the findings, ordered by severity.


BUG: s.dataUrl not escaped in HTML src attribute (XSS vector)

File: scripts/demo-report-html.mjs:127

`<img src="${s.dataUrl}" alt="${escapeHtml(s.name)}" /></div>`,

The s.dataUrl value is injected raw into the src attribute. While in the normal flow this is a data:image/png;base64,... string produced by screenshotToDataUrl, the generateHtmlReport function is also a public export that accepts arbitrary input. A caller could pass " onerror="alert(1) as a dataUrl and break out of the attribute. The alt attribute is correctly escaped; src should be too, or at minimum validated as a data: URL prefix.

Recommended fix: Either escapeHtml(s.dataUrl) the src attribute, or validate it starts with data:image/.


BUG: validateSoakConfig unreachable negative check

File: scripts/demo-soak.mjs:63-69

if (maxIterations <= 0 && maxDurationMs <= 0) {
  throw new Error('...')  // line 63-66
}

if (maxIterations < 0 || !Number.isFinite(maxIterations)) {
  throw new Error(`Invalid maxIterations: ${maxIterations}`)  // line 69
}

The maxIterations < 0 check on line 69 is dead code. If maxIterations is negative, it would already be <= 0 and the first guard on line 63 would throw (assuming maxDurationMs is also <= 0). However, if maxDurationMs > 0, then a negative maxIterations would pass the first check and reach the second. So the logic partially works -- but the intent is unclear and the validation is fragile. If someone passes { maxIterations: -5, maxDurationMs: 10000 }, the negative check catches it. But { maxIterations: NaN, maxDurationMs: 10000 } also passes because NaN <= 0 is false (!) -- so NaN slips through the first guard, then Number.isFinite(NaN) catches it. The ordering is confusing but technically correct. Not a bug, but worth a code comment.


ISSUE: Soak mode stores all iterations in memory (unbounded array)

File: scripts/demo-soak.mjs:170

const iterations = []
// ... iterations.push({...}) each loop

When running soak with maxDurationMs only (no iteration limit), the iterations array grows unboundedly. For a long-duration soak (hours/days), this could consume significant memory. The buildSoakSummary then also calls Math.min(...durations) / Math.max(...durations) which spreads the entire array as function arguments -- this will throw a RangeError once the array exceeds ~100k-200k elements (V8 max call stack argument limit).

Recommended fix: Track min/max/sum incrementally instead of spreading, and consider a rolling window or periodic flush for very long soaks.


ISSUE: registerPreset mutates the module-level PRESETS object (test pollution)

File: scripts/demo-director-presets.mjs:174-178

export function registerPreset(preset) {
  if (!preset?.id) throw new Error('Preset must have an id')
  PRESETS[preset.id] = preset
}

The test demo-director-presets.spec.ts:1003-1016 calls registerPreset({ id: 'test-custom-preset', ... }) which permanently mutates the shared PRESETS object. Since vitest runs tests in the same process, this custom preset persists across all subsequent tests in the suite. If another test file calls listPresetIds() after this test runs, it will see 5 presets instead of 4.

The test does not clean up after itself. This is a test isolation bug that could cause flaky failures in the future.


ISSUE: Typo in type name: TraceAssertionResult (missing 'r')

File: scripts/demo-trace-assertions.mjs:12

@typedef {object} TraceAssertionResult

Should be TraceAssertionResult -> TraceAssertionResult ... wait, actually: "Assertion" vs "Assertion". The correct English spelling is Assertion. The typedef uses Assertion throughout (lines 12, 25, 66, 107, 199). This is a typo that will be confusing for anyone consuming or extending the API.


MINOR: stepRows falsy check is misleading

File: scripts/demo-report-html.mjs:185

<tbody>${stepRows || '<tr><td colspan="5">No trace events.</td></tr>'}</tbody>

stepRows is the result of .map(...).join('\n'). When traceEvents is an empty array, stepRows will be '' (empty string), which is falsy -- so the fallback works. But this is fragile: it relies on the implicit falsiness of an empty string. A comment or explicit steps.length === 0 check would be clearer.


MINOR: generateReportFromArtifacts crashes on missing run-summary.json

File: scripts/demo-report-html.mjs:210

const runSummary = JSON.parse(await fs.readFile(summaryPath, 'utf8'))

Unlike the trace and screenshots reads which are wrapped in try/catch, the run-summary.json read is unguarded. If the file is missing or contains invalid JSON, the function throws an unhandled error. This is arguably correct (summary is required), but it is inconsistent with the graceful degradation pattern used for trace and screenshots. Worth at least a descriptive error message.


MINOR: assertTrace does not validate events is an array

File: scripts/demo-trace-assertions.mjs:201

The individual assertion functions (assertTraceStepOrdering, assertNoUnexpectedErrors, etc.) all check if (!Array.isArray(events)) and return early. But assertTrace itself does not check before dispatching. If events is not an array and no requiredSequence or requiredEvents are set, only the assertNoUnexpectedErrors call would catch it. This is fine in practice, but a top-level guard would be more robust.


TEST QUALITY NOTES

  • Tests are generally well-structured and assert meaningful properties.
  • The soak maxDurationMs test (line 1485-1500) is timing-dependent and could be flaky on slow CI runners -- the 30ms sleep + 100ms budget is tight.
  • No test covers generateReportFromArtifacts (the file-system-dependent function). This is the most complex function in the HTML report module and has real error handling paths that are untested.
  • No test covers screenshotToDataUrl with an actual file (only indirectly via the report generator).

Summary

Severity Count Issues
Bug 1 XSS via unescaped dataUrl in HTML report
Design 1 Unbounded iterations array in soak mode (stack overflow on spread)
Quality 2 Test pollution via registerPreset; Typo in TraceAssertionResult
Minor 3 stepRows falsy check, unguarded summary read, missing top-level array check

The XSS issue and the Math.min(...durations) stack overflow on large soak runs are the two items I would fix before merge.

…ummary

- Escape s.dataUrl with escapeHtml() in screenshot img src attribute to
  prevent XSS via crafted dataUrl strings in generateHtmlReport
- Replace Math.min(...durations)/Math.max(...durations) spread with
  incremental tracking loop in buildSoakSummary to avoid RangeError
  when iterations array exceeds V8 max call stack argument limit
@Chris0Jeky Chris0Jeky merged commit c5d792e into main Mar 29, 2026
18 checks passed
@Chris0Jeky Chris0Jeky deleted the test/demo-director-reporting-and-soak branch March 29, 2026 03:50
@github-project-automation github-project-automation bot moved this from Pending to Done in Taskdeck Execution Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

TST-21: Expand demo director reporting, assertions, presets, and soak follow-through

1 participant