Skip to content

Commit 6156122

Browse files
garrytanclaude
andauthored
test: E2E tests for plan review report and Codex offering (v0.11.15.0) (garrytan#449)
* chore: regen SKILL.md from template changes Regenerated via `bun run gen:skill-docs` — was stale from prior template updates (Codex paths, preamble resolver). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add E2E tests for plan review report and codex offering - plan-review-report: verifies plan-eng-review writes ## GSTACK REVIEW REPORT to the bottom of the plan file - codex-offered-{office-hours,ceo-review,design-review,eng-review}: verifies each skill has Codex availability check, user prompt, and fallback behavior (4 concurrent lightweight tests) - Updated touchfiles and selection count assertion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add touchfiles to global touchfile list in CLAUDE.md The touchfiles.ts file itself is a global touchfile that triggers all tests when changed, but was missing from the documented list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.11.15.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3501f5d commit 6156122

File tree

6 files changed

+218
-4
lines changed

6 files changed

+218
-4
lines changed

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
# Changelog
22

3+
## [0.11.15.0] - 2026-03-24 — E2E Test Coverage for Plan Reviews & Codex
4+
5+
### Added
6+
7+
- **E2E tests verify plan review reports appear at the bottom of plans.** The `/plan-eng-review` review report is now tested end-to-end — if it stops writing `## GSTACK REVIEW REPORT` to the plan file, the test catches it.
8+
- **E2E tests verify Codex is offered in every plan skill.** Four new lightweight tests confirm that `/office-hours`, `/plan-ceo-review`, `/plan-design-review`, and `/plan-eng-review` all check for Codex availability, prompt the user, and handle the fallback when Codex is unavailable.
9+
10+
### For contributors
11+
12+
- New E2E tests in `test/skill-e2e-plan.test.ts`: `plan-review-report`, `codex-offered-eng-review`, `codex-offered-ceo-review`, `codex-offered-office-hours`, `codex-offered-design-review`
13+
- Updated touchfile mappings and selection count assertions
14+
- Added `touchfiles` to the documented global touchfile list in CLAUDE.md
15+
316
## [0.11.14.0] - 2026-03-24 — Windows Browse Fix
417

518
### Fixed

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ against the previous run.
2929
**Diff-based test selection:** `test:evals` and `test:e2e` auto-select tests based
3030
on `git diff` against the base branch. Each test declares its file dependencies in
3131
`test/helpers/touchfiles.ts`. Changes to global touchfiles (session-runner, eval-store,
32-
llm-judge, gen-skill-docs) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
32+
llm-judge, gen-skill-docs, touchfiles) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
3333
variants to force all tests. Run `eval:select` to preview which tests would run.
3434

3535
## Testing

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.11.14.0
1+
0.11.15.0

test/helpers/touchfiles.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,13 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
6868
'plan-ceo-review-benefits': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'],
6969
'plan-eng-review': ['plan-eng-review/**'],
7070
'plan-eng-review-artifact': ['plan-eng-review/**'],
71+
'plan-review-report': ['plan-eng-review/**', 'scripts/gen-skill-docs.ts'],
72+
73+
// Codex offering verification
74+
'codex-offered-office-hours': ['office-hours/**', 'scripts/gen-skill-docs.ts'],
75+
'codex-offered-ceo-review': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'],
76+
'codex-offered-design-review': ['plan-design-review/**', 'scripts/gen-skill-docs.ts'],
77+
'codex-offered-eng-review': ['plan-eng-review/**', 'scripts/gen-skill-docs.ts'],
7178

7279
// Ship
7380
'ship-base-branch': ['ship/**', 'bin/gstack-repo-mode'],

test/skill-e2e-plan.test.ts

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -535,6 +535,199 @@ Write your summary to ${benefitsDir}/benefits-summary.md`,
535535
}, 180_000);
536536
});
537537

538+
// --- Plan Review Report E2E ---
539+
// Verifies that plan-eng-review writes a "## GSTACK REVIEW REPORT" section
540+
// to the bottom of the plan file (the living review status footer).
541+
542+
describeIfSelected('Plan Review Report E2E', ['plan-review-report'], () => {
543+
let planDir: string;
544+
545+
beforeAll(() => {
546+
planDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-review-report-'));
547+
const run = (cmd: string, args: string[]) =>
548+
spawnSync(cmd, args, { cwd: planDir, stdio: 'pipe', timeout: 5000 });
549+
550+
run('git', ['init', '-b', 'main']);
551+
run('git', ['config', 'user.email', 'test@test.com']);
552+
run('git', ['config', 'user.name', 'Test']);
553+
554+
fs.writeFileSync(path.join(planDir, 'plan.md'), `# Plan: Add Notifications System
555+
556+
## Context
557+
We're building a real-time notification system for our SaaS app.
558+
559+
## Changes
560+
1. WebSocket server for push notifications
561+
2. Notification preferences API
562+
3. Email digest fallback for offline users
563+
4. PostgreSQL table for notification storage
564+
565+
## Architecture
566+
- WebSocket: Socket.io on Express
567+
- Queue: Bull + Redis for email digests
568+
- Storage: PostgreSQL notifications table
569+
- Frontend: React toast component
570+
571+
## Open questions
572+
- Retry policy for failed WebSocket delivery?
573+
- Max notifications stored per user?
574+
`);
575+
576+
run('git', ['add', '.']);
577+
run('git', ['commit', '-m', 'add plan']);
578+
579+
// Copy plan-eng-review skill
580+
fs.mkdirSync(path.join(planDir, 'plan-eng-review'), { recursive: true });
581+
fs.copyFileSync(
582+
path.join(ROOT, 'plan-eng-review', 'SKILL.md'),
583+
path.join(planDir, 'plan-eng-review', 'SKILL.md'),
584+
);
585+
});
586+
587+
afterAll(() => {
588+
try { fs.rmSync(planDir, { recursive: true, force: true }); } catch {}
589+
});
590+
591+
test('/plan-eng-review writes GSTACK REVIEW REPORT to plan file', async () => {
592+
const result = await runSkillTest({
593+
prompt: `Read plan-eng-review/SKILL.md for the review workflow.
594+
595+
Read plan.md — that's the plan to review. This is a standalone plan document, not a codebase — skip any codebase exploration steps.
596+
597+
Proceed directly to the full review. Skip any AskUserQuestion calls — this is non-interactive.
598+
Skip the preamble bash block, lake intro, telemetry, and contributor mode sections.
599+
600+
CRITICAL REQUIREMENT: plan.md IS the plan file for this review session. After completing your review, you MUST write a "## GSTACK REVIEW REPORT" section to the END of plan.md, exactly as described in the "Plan File Review Report" section of SKILL.md. If gstack-review-read is not available or returns NO_REVIEWS, write the placeholder table with all four review rows (CEO, Codex, Eng, Design). Use the Edit tool to append to plan.md — do NOT overwrite the existing plan content.
601+
602+
This review report at the bottom of the plan is the MOST IMPORTANT deliverable of this test.`,
603+
workingDirectory: planDir,
604+
maxTurns: 20,
605+
timeout: 360_000,
606+
testName: 'plan-review-report',
607+
runId,
608+
model: 'claude-opus-4-6',
609+
});
610+
611+
logCost('/plan-eng-review report', result);
612+
recordE2E(evalCollector, '/plan-review-report', 'Plan Review Report E2E', result, {
613+
passed: ['success', 'error_max_turns'].includes(result.exitReason),
614+
});
615+
expect(['success', 'error_max_turns']).toContain(result.exitReason);
616+
617+
// Verify the review report was written to the plan file
618+
const planContent = fs.readFileSync(path.join(planDir, 'plan.md'), 'utf-8');
619+
620+
// Original plan content should still be present
621+
expect(planContent).toContain('# Plan: Add Notifications System');
622+
expect(planContent).toContain('WebSocket');
623+
624+
// Review report section must exist
625+
expect(planContent).toContain('## GSTACK REVIEW REPORT');
626+
627+
// Report should be at the bottom of the file
628+
const reportIndex = planContent.lastIndexOf('## GSTACK REVIEW REPORT');
629+
const afterReport = planContent.slice(reportIndex);
630+
631+
// Should contain the review table with standard rows
632+
expect(afterReport).toMatch(/\|\s*Review\s*\|/);
633+
expect(afterReport).toContain('CEO Review');
634+
expect(afterReport).toContain('Eng Review');
635+
expect(afterReport).toContain('Design Review');
636+
637+
console.log('Plan review report found at bottom of plan.md');
638+
}, 420_000);
639+
});
640+
641+
// --- Codex Offering E2E ---
642+
// Verifies that Codex is properly offered (with availability check, user prompt,
643+
// and fallback) in office-hours, plan-ceo-review, plan-design-review, plan-eng-review.
644+
645+
describeIfSelected('Codex Offering E2E', [
646+
'codex-offered-office-hours', 'codex-offered-ceo-review',
647+
'codex-offered-design-review', 'codex-offered-eng-review',
648+
], () => {
649+
let testDir: string;
650+
651+
beforeAll(() => {
652+
testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-codex-offer-'));
653+
const run = (cmd: string, args: string[]) =>
654+
spawnSync(cmd, args, { cwd: testDir, stdio: 'pipe', timeout: 5000 });
655+
656+
run('git', ['init', '-b', 'main']);
657+
run('git', ['config', 'user.email', 'test@test.com']);
658+
run('git', ['config', 'user.name', 'Test']);
659+
fs.writeFileSync(path.join(testDir, 'README.md'), '# Test Project\n');
660+
run('git', ['add', '.']);
661+
run('git', ['commit', '-m', 'init']);
662+
663+
// Copy all 4 SKILL.md files
664+
for (const skill of ['office-hours', 'plan-ceo-review', 'plan-design-review', 'plan-eng-review']) {
665+
fs.mkdirSync(path.join(testDir, skill), { recursive: true });
666+
fs.copyFileSync(
667+
path.join(ROOT, skill, 'SKILL.md'),
668+
path.join(testDir, skill, 'SKILL.md'),
669+
);
670+
}
671+
});
672+
673+
afterAll(() => {
674+
try { fs.rmSync(testDir, { recursive: true, force: true }); } catch {}
675+
});
676+
677+
async function checkCodexOffering(skill: string, testName: string, featureName: string) {
678+
const result = await runSkillTest({
679+
prompt: `Read ${skill}/SKILL.md. Search for ALL sections related to "codex", "outside voice", or "second opinion".
680+
681+
Summarize the Codex/${featureName} integration — answer these specific questions:
682+
1. How is Codex availability checked? (what exact bash command?)
683+
2. How is the user prompted? (via AskUserQuestion? what are the options?)
684+
3. What happens when Codex is NOT available? (fallback to subagent? skip entirely?)
685+
4. Is this step blocking (gates the workflow) or optional (can be skipped)?
686+
5. What prompt/context is sent to Codex?
687+
688+
Write your summary to ${testDir}/${testName}-summary.md`,
689+
workingDirectory: testDir,
690+
maxTurns: 8,
691+
timeout: 120_000,
692+
testName,
693+
runId,
694+
});
695+
696+
logCost(`/${skill} codex offering`, result);
697+
recordE2E(evalCollector, `/${testName}`, 'Codex Offering E2E', result);
698+
expect(result.exitReason).toBe('success');
699+
700+
const summaryPath = path.join(testDir, `${testName}-summary.md`);
701+
expect(fs.existsSync(summaryPath)).toBe(true);
702+
703+
const summary = fs.readFileSync(summaryPath, 'utf-8').toLowerCase();
704+
// All skills should have codex availability check
705+
expect(summary).toMatch(/which codex/);
706+
// All skills should have fallback behavior
707+
expect(summary).toMatch(/fallback|subagent|unavailable|not available|skip/);
708+
// All skills should show it's optional/non-blocking
709+
expect(summary).toMatch(/optional|non.?blocking|skip|not.*required/);
710+
711+
console.log(`${skill}: Codex offering verified`);
712+
}
713+
714+
testConcurrentIfSelected('codex-offered-office-hours', async () => {
715+
await checkCodexOffering('office-hours', 'codex-offered-office-hours', 'second opinion');
716+
}, 180_000);
717+
718+
testConcurrentIfSelected('codex-offered-ceo-review', async () => {
719+
await checkCodexOffering('plan-ceo-review', 'codex-offered-ceo-review', 'outside voice');
720+
}, 180_000);
721+
722+
testConcurrentIfSelected('codex-offered-design-review', async () => {
723+
await checkCodexOffering('plan-design-review', 'codex-offered-design-review', 'design outside voices');
724+
}, 180_000);
725+
726+
testConcurrentIfSelected('codex-offered-eng-review', async () => {
727+
await checkCodexOffering('plan-eng-review', 'codex-offered-eng-review', 'outside voice');
728+
}, 180_000);
729+
});
730+
538731
// Module-level afterAll — finalize eval collector after all tests complete
539732
afterAll(async () => {
540733
await finalizeEvalCollector(evalCollector);

test/touchfiles.test.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,9 @@ describe('selectTests', () => {
8080
expect(result.selected).toContain('plan-ceo-review-selective');
8181
expect(result.selected).toContain('plan-ceo-review-benefits');
8282
expect(result.selected).toContain('autoplan-core');
83-
expect(result.selected.length).toBe(4);
84-
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 4);
83+
expect(result.selected).toContain('codex-offered-ceo-review');
84+
expect(result.selected.length).toBe(5);
85+
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 5);
8586
});
8687

8788
test('global touchfile triggers ALL tests', () => {

0 commit comments

Comments
 (0)