Skip to content

Commit c79c260

Browse files
authored
Merge pull request #797 from Chris0Jeky/test/visual-regression-harness
Add visual regression harness for key UI surfaces
2 parents ca43946 + 3b03423 commit c79c260

14 files changed

+941
-1
lines changed

.github/workflows/ci-extended.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,16 @@ jobs:
107107
dotnet-version: 8.0.x
108108
node-version: 24.13.1
109109

110+
visual-regression:
111+
name: Visual Regression
112+
if: github.event_name == 'workflow_dispatch' || (github.event_name == 'pull_request' && (contains(github.event.pull_request.labels.*.name, 'testing') || contains(github.event.pull_request.labels.*.name, 'visual')))
113+
needs:
114+
- backend-solution
115+
uses: ./.github/workflows/reusable-visual-regression.yml
116+
with:
117+
dotnet-version: 8.0.x
118+
node-version: 24.13.1
119+
110120
load-concurrency-harness:
111121
name: Load and Concurrency Harness
112122
if: github.event_name == 'workflow_dispatch' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'testing'))
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
name: Reusable Visual Regression
2+
3+
on:
4+
workflow_call:
5+
inputs:
6+
dotnet-version:
7+
description: .NET SDK version used for backend setup
8+
required: false
9+
default: "8.0.x"
10+
type: string
11+
node-version:
12+
description: Node.js version used for frontend setup
13+
required: false
14+
default: "24.13.1"
15+
type: string
16+
17+
permissions:
18+
contents: read
19+
20+
env:
21+
NUGET_PACKAGES: ${{ github.workspace }}/.nuget/packages
22+
23+
jobs:
24+
visual-regression:
25+
name: Visual Regression
26+
runs-on: ubuntu-latest
27+
timeout-minutes: 20
28+
steps:
29+
- name: Checkout
30+
uses: actions/checkout@v6
31+
32+
- name: Setup .NET
33+
uses: actions/setup-dotnet@v5
34+
with:
35+
dotnet-version: ${{ inputs.dotnet-version }}
36+
cache: true
37+
cache-dependency-path: |
38+
backend/Taskdeck.sln
39+
backend/**/*.csproj
40+
41+
- name: Setup Node
42+
uses: actions/setup-node@v6
43+
with:
44+
node-version: ${{ inputs.node-version }}
45+
cache: npm
46+
cache-dependency-path: frontend/taskdeck-web/package-lock.json
47+
48+
- name: Restore backend
49+
run: dotnet restore backend/Taskdeck.sln
50+
51+
- name: Install frontend dependencies
52+
working-directory: frontend/taskdeck-web
53+
run: npm ci
54+
55+
- name: Cache Playwright browsers
56+
uses: actions/cache@v5
57+
with:
58+
path: ~/.cache/ms-playwright
59+
key: ms-playwright-${{ runner.os }}-${{ hashFiles('frontend/taskdeck-web/package-lock.json') }}
60+
61+
- name: Install Playwright browser
62+
working-directory: frontend/taskdeck-web
63+
run: npx playwright install --with-deps chromium
64+
65+
- name: Remove stale visual E2E database
66+
working-directory: frontend/taskdeck-web
67+
run: node -e "require('fs').rmSync('taskdeck.e2e.visual.ci.db',{force:true});"
68+
69+
- name: Check for existing baselines
70+
id: baselines
71+
working-directory: frontend/taskdeck-web
72+
run: |
73+
if [ -d "tests/visual/__screenshots__" ] && [ "$(find tests/visual/__screenshots__ -name '*.png' 2>/dev/null | head -1)" ]; then
74+
echo "exist=true" >> "$GITHUB_OUTPUT"
75+
else
76+
echo "exist=false" >> "$GITHUB_OUTPUT"
77+
echo "::warning::No baseline screenshots found. Running with --update-snapshots to generate initial baselines. Download the visual-regression-baselines artifact and commit them."
78+
fi
79+
80+
- name: Run visual regression tests
81+
timeout-minutes: 12
82+
working-directory: frontend/taskdeck-web
83+
env:
84+
CI: "true"
85+
TASKDECK_E2E_DB: taskdeck.e2e.visual.ci.db
86+
TASKDECK_RUN_DEMO: "0"
87+
run: |
88+
if [ "${{ steps.baselines.outputs.exist }}" = "false" ]; then
89+
npx playwright test --config playwright.visual.config.ts --update-snapshots --reporter=line
90+
else
91+
npx playwright test --config playwright.visual.config.ts --reporter=line
92+
fi
93+
94+
- name: Upload generated baselines
95+
if: steps.baselines.outputs.exist == 'false'
96+
uses: actions/upload-artifact@v7
97+
with:
98+
name: visual-regression-baselines
99+
path: frontend/taskdeck-web/tests/visual/__screenshots__/
100+
if-no-files-found: warn
101+
retention-days: 30
102+
103+
- name: Upload visual diff artifacts
104+
if: failure()
105+
uses: actions/upload-artifact@v7
106+
with:
107+
name: visual-regression-diffs
108+
path: |
109+
frontend/taskdeck-web/test-results/
110+
if-no-files-found: ignore
111+
retention-days: 14
112+
113+
- name: Upload Playwright HTML report
114+
if: failure()
115+
uses: actions/upload-artifact@v7
116+
with:
117+
name: visual-regression-report
118+
path: frontend/taskdeck-web/playwright-report
119+
if-no-files-found: ignore
120+
retention-days: 14

docs/STATUS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Current constraints are mostly hardening and consistency:
2828
- LLM flow now supports config-gated `OpenAI` and `Gemini` providers with deterministic `Mock` fallback for safe local/test posture; degraded provider responses are now structurally distinct (`messageType: "degraded"` + `degradedReason`) and the health endpoint supports opt-in probe verification (`?probe=true`); chat-to-proposal pipeline improvements delivered: `LlmIntentClassifier` now uses compiled regex patterns with word-distance matching, stemming/plurals, broader verb coverage, and negative context filtering for negations and other-tool questions (`#571`); parse failures now return structured hint payloads with closest-match suggestions and a frontend hint card with "try this instead" pre-fill (`#572`); dedicated classifier and chat-to-proposal integration test coverage added (`#577`); LLM-assisted instruction extraction now delivered (`#573`): OpenAI and Gemini providers request structured JSON output with a system prompt describing supported instruction patterns, parse the response into `LlmCompletionResult.Instructions`, and fall back to the static `LlmIntentClassifier` when structured parsing fails; `ChatService` iterates LLM-extracted instructions (supporting multiple proposals from a single message) and falls back to raw user message parsing when no instructions are extracted; Mock provider unchanged for deterministic test behavior; multi-instruction batch parsing now delivered (`#574`): `ParseBatchInstructionAsync` splits multiple natural-language instructions into individual planner calls, `ChatService` routes multi-instruction messages through batch parsing to generate multiple proposals from a single chat message; board-context LLM prompting now delivered (`#575`, expanded in `#617`): `BoardContextBuilder` constructs bounded board context (columns, card IDs, titles, labels) grouped per column and appends it to system prompts across OpenAI and Gemini providers via `LlmSystemPromptBuilder`; card IDs are included as first-8 hex chars so the LLM can generate `move card <id>` instructions; context budget increased to 4000 chars with single-query card fetch; **remaining gap**: conversational refinement (`#576`) remains undelivered; analysis at `docs/analysis/2026-03-29_chat_nlp_proposal_gap.md`
2929
- managed-key shared-token abuse-control strategy is now explicitly seeded in `#235` to `#240` before broad external exposure
3030
- testing-harness guardrail expansion from `#254` to `#260` is shipped; remaining work is normal follow-up hardening rather than the original wave
31+
- visual regression harness delivered (`#88`): Playwright-based screenshot comparison for 7 key UI surfaces (board empty/populated, command palette open/search, archive, inbox, home); separate `playwright.visual.config.ts` with fixed viewport (1280x720), animations disabled, 0.5% pixel tolerance; CI Extended integration via `reusable-visual-regression.yml` with diff artifact upload on failure; policy document at `docs/testing/VISUAL_REGRESSION_POLICY.md`
3132
- rigorous test expansion wave seeded 2026-04-03 (`#721` tracker, 22 issues `#699`–`#726`): systematic codebase audit identified 25+ untested infrastructure repositories, zero tests on the central worker, 6 controllers with untested HTTP surfaces, and no golden-path integration test for the capture → proposal → board pipeline; execution is tracked in `docs/TESTING_GUIDE.md`; first delivery: infrastructure repository integration tests (`#699`/`#730` — 77 tests across 7 repo classes against real SQLite); **major wave delivery 2026-04-04** (PRs `#732`–`#739`, 8 issues, ~300 new tests): SEC-20 ChangePassword fix (`#722`/`#732`), golden-path capture→board integration test (`#703`/`#735` — 7 tests proving full pipeline), cross-user data isolation tests (`#704`/`#733` — 38 tests across all major API boundaries), LlmQueueToProposalWorker integration tests (`#700`/`#734` — 24 tests, previously zero coverage), controller HTTP integration tests (`#702`/`#738` — 67 tests covering 6 untested controllers, found 2 pre-existing bugs), proposal lifecycle edge cases (`#708`/`#736` — 74 tests for state machine/expiry/race conditions), OAuth/auth edge cases (`#707`/`#737` — 44 tests, found and fixed `Substring` overflow bug in `ExternalLoginAsync`), MCP full resource/tool inventory (`#653`/`#739` — 9 resources + 11 tools with 42 tests, GP-06 compliant, user-scoping gap fixed during review); **second wave delivery 2026-04-04** (PRs `#740`–`#755`, 8 issues, ~586 new tests with two rounds of adversarial review, 47 review-fix commits): domain entity state machine exhaustive tests (`#701`/`#740` — 174 tests across 7 entities: CommandRun, ArchiveItem, ChatSession, UserPreference, NotificationPreference, CardLabel, CardCommentMention), SignalR hub and realtime integration tests (`#706`/`#751` — 19 tests covering auth, presence lifecycle, multi-user, authorization, edge cases), LLM provider abstraction and tool-calling edge cases (`#709`/`#747` — 101 tests across orchestrator, provider, classifier, registry), data export/import round-trip integrity tests (`#713`/`#752` — 64 tests covering JSON, CSV, GDPR, database, cross-format validation), API error contract regression and boundary validation (`#714`/`#753` — 57 tests across 7 endpoint families with GP-03 contract enforcement), archive and restore lifecycle integration tests (`#715`/`#755` — 74 tests: 45 domain + 29 API covering state machine, cross-user isolation, conflict detection, audit trail), board metrics and analytics accuracy verification (`#718`/`#749` — 61 tests: 51 service + 10 controller covering throughput, cycle time, WIP, blocked cards, done-column heuristic), notification delivery, deduplication, and preference filtering (`#719`/`#746` — 36 tests covering all 5 notification types, deduplication, preference filtering, cross-user isolation, batch operations)
3233
- MVP dogfooding flow now supports canonical checklist bootstrap in chat (proposal-first, board-scoped); broader template coverage remains future work
3334
- collaborative editing now includes board/card presence visibility and conflict-hinting guardrails for stale card writes

docs/TESTING_GUIDE.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -596,6 +596,36 @@ cd frontend/taskdeck-web
596596
npm run test:e2e:audit:headed
597597
```
598598

599+
## Visual Regression Tests
600+
601+
Visual regression tests capture baseline screenshots of key UI surfaces and compare them against future renders to catch unintended layout changes.
602+
603+
**Policy document**: `docs/testing/VISUAL_REGRESSION_POLICY.md` (thresholds, false-positive mitigation, baseline management)
604+
605+
**Test location**: `frontend/taskdeck-web/tests/visual/`
606+
607+
**Config**: `frontend/taskdeck-web/playwright.visual.config.ts`
608+
609+
**Covered surfaces**: board view (empty + populated), command palette (open + search), archive view, inbox/capture view, home view
610+
611+
Run visual tests:
612+
613+
```bash
614+
cd frontend/taskdeck-web
615+
npm run test:visual
616+
```
617+
618+
Update baselines after intentional UI changes:
619+
620+
```bash
621+
cd frontend/taskdeck-web
622+
npm run test:visual:update
623+
```
624+
625+
Key settings: fixed viewport 1280x720, animations disabled, 0.5% pixel tolerance, platform-specific baselines (CI canonical platform: ubuntu-latest).
626+
627+
CI integration: runs in CI Extended pipeline with `testing` or `visual` PR labels. Diff artifacts uploaded on failure for review.
628+
599629
## Demo Tooling Policy
600630

601631
Default CI posture:

0 commit comments

Comments
 (0)