Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b9a6257
Add cross-browser and mobile device projects to Playwright config
Chris0Jeky Apr 9, 2026
b135caf
Add mobile viewport E2E tests for responsive behavior
Chris0Jeky Apr 9, 2026
be05a6a
Add cross-browser E2E tests for critical user journeys
Chris0Jeky Apr 9, 2026
de5c0d6
Add cross-browser E2E matrix CI workflow
Chris0Jeky Apr 9, 2026
565a093
Add flaky test policy with quarantine process and tagging strategy
Chris0Jeky Apr 9, 2026
42a589d
Update TESTING_GUIDE.md with cross-browser and mobile testing section
Chris0Jeky Apr 9, 2026
65267da
Update STATUS.md and IMPLEMENTATION_MASTERPLAN.md for cross-browser E2E
Chris0Jeky Apr 9, 2026
13ebebc
Pin smoke CI to chromium project and add quarantine exclusion
Chris0Jeky Apr 9, 2026
7a61784
Update CI topology comment to include cross-browser workflow
Chris0Jeky Apr 9, 2026
b0a9131
Fix review findings: deduplicate browser cache key and clarify quaran…
Chris0Jeky Apr 9, 2026
0264b00
Add shared UI-level board helpers to reduce test duplication
Chris0Jeky Apr 9, 2026
d5db85b
Use shared board helpers in cross-browser spec and document PR gate i…
Chris0Jeky Apr 9, 2026
435b278
Fix mobile spec: use shared helpers, rename misleading test, remove c…
Chris0Jeky Apr 9, 2026
8deb464
Document @cross-browser PR gate impact in chromium project config
Chris0Jeky Apr 9, 2026
9759af7
Fix IMPLEMENTATION_MASTERPLAN.md numbering for cross-browser entry
Chris0Jeky Apr 9, 2026
10890cc
Merge main to update branch
claude Apr 9, 2026
97f801c
Guard proposal decisions with EF concurrency
Chris0Jeky Apr 9, 2026
a1f7c3d
Remove dead mobile e2e helper import
Chris0Jeky Apr 9, 2026
d2099fc
Merge origin/main into test/e2e-cross-browser-mobile-matrix
Chris0Jeky Apr 11, 2026
5369c6a
Fix cross-browser test click target and preserve quarantine exclusion
Chris0Jeky Apr 12, 2026
9b1555d
Trigger CI
Chris0Jeky Apr 12, 2026
5b577e4
Merge main to resolve conflicts
Chris0Jeky Apr 12, 2026
c1af172
Fix cross-browser E2E test strict mode violation
Chris0Jeky Apr 12, 2026
f8351f3
Resolve merge conflict in STATUS.md for cross-browser E2E
Chris0Jeky Apr 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/ci-extended.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,16 @@ jobs:
dotnet-version: 8.0.x
node-version: 24.13.1

e2e-cross-browser:
name: E2E Cross-Browser Matrix
if: github.event_name == 'workflow_dispatch' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'testing'))
needs:
- backend-solution
uses: ./.github/workflows/reusable-e2e-cross-browser.yml
with:
dotnet-version: 8.0.x
node-version: 24.13.1

visual-regression:
name: Visual Regression
if: github.event_name == 'workflow_dispatch' || (github.event_name == 'pull_request' && (contains(github.event.pull_request.labels.*.name, 'testing') || contains(github.event.pull_request.labels.*.name, 'visual')))
Expand Down
9 changes: 9 additions & 0 deletions .github/workflows/ci-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ jobs:
k6-duration: "90s"
k6-user-pool: "6"

e2e-cross-browser:
name: E2E Cross-Browser Matrix
needs:
- backend-solution
uses: ./.github/workflows/reusable-e2e-cross-browser.yml
with:
dotnet-version: 8.0.x
node-version: 24.13.1

container-images:
name: Container Images Regression
needs:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/ci-required.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
# ├── reusable-openapi-guardrail.yml
# ├── reusable-backend-solution.yml (label: testing)
# ├── reusable-e2e-smoke.yml (label: testing)
# ├── reusable-e2e-cross-browser.yml (label: testing)
# ├── reusable-demo-director-smoke.yml (label: automation)
# ├── reusable-load-concurrency-harness.yml (label: testing)
# └── reusable-container-integration.yml (label: testing) — Testcontainers PostgreSQL
Expand All @@ -28,6 +29,7 @@
# ├── reusable-openapi-guardrail.yml
# ├── reusable-backend-solution.yml
# ├── reusable-e2e-smoke.yml
# ├── reusable-e2e-cross-browser.yml
# ├── reusable-load-concurrency-harness.yml
# └── reusable-container-images.yml
#
Expand Down
106 changes: 106 additions & 0 deletions .github/workflows/reusable-e2e-cross-browser.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
name: Reusable E2E Cross-Browser Matrix

on:
workflow_call:
inputs:
dotnet-version:
description: .NET SDK version used for E2E backend setup
required: false
default: "8.0.x"
type: string
node-version:
description: Node.js version used for E2E frontend setup
required: false
default: "24.13.1"
type: string

permissions:
contents: read

env:
NUGET_PACKAGES: ${{ github.workspace }}/.nuget/packages

jobs:
e2e-cross-browser:
name: E2E (${{ matrix.project }})
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
include:
- project: chromium
browser: chromium
- project: firefox
browser: firefox
- project: webkit
browser: webkit
- project: mobile-chrome
browser: chromium
- project: mobile-safari
browser: webkit
steps:
- name: Checkout
uses: actions/checkout@v6

- name: Setup .NET
uses: actions/setup-dotnet@v5
with:
dotnet-version: ${{ inputs.dotnet-version }}
cache: true
cache-dependency-path: |
backend/Taskdeck.sln
backend/**/*.csproj

- name: Setup Node
uses: actions/setup-node@v6
with:
node-version: ${{ inputs.node-version }}
cache: npm
cache-dependency-path: frontend/taskdeck-web/package-lock.json

- name: Restore backend
run: dotnet restore backend/Taskdeck.sln

- name: Install frontend dependencies
working-directory: frontend/taskdeck-web
run: npm ci

- name: Cache Playwright browsers
uses: actions/cache@v5
with:
path: ~/.cache/ms-playwright
key: ms-playwright-${{ runner.os }}-${{ hashFiles('frontend/taskdeck-web/package-lock.json') }}

Comment on lines +69 to +74
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Playwright browser cache key includes matrix.project, but the workflow installs the full Playwright browser set (npx playwright install --with-deps) for every job. This will create 5 separate caches with identical contents and reduce cache hit rates / waste storage. Consider removing matrix.project from the cache key (or adding a restore key) so all matrix jobs can reuse the same cached browsers.

Copilot uses AI. Check for mistakes.
- name: Install Playwright browsers
working-directory: frontend/taskdeck-web
run: npx playwright install --with-deps ${{ matrix.browser }}

Comment on lines +75 to +78
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each matrix job installs all Playwright browsers (npx playwright install --with-deps) even though the job only runs a single --project. This increases runtime and network usage for the nightly/extended matrix. Consider installing only the required browser per project (e.g., chromium for chromium/mobile-chrome, firefox for firefox, webkit for webkit/mobile-safari).

Copilot uses AI. Check for mistakes.
- name: Remove stale E2E database
working-directory: frontend/taskdeck-web
run: node -e "require('fs').rmSync('taskdeck.e2e.ci.db',{force:true});"

- name: Run Playwright tests (${{ matrix.project }})
timeout-minutes: 15
working-directory: frontend/taskdeck-web
env:
CI: "true"
TASKDECK_E2E_DB: taskdeck.e2e.ci.db
TASKDECK_RUN_DEMO: "0"
run: npx playwright test --project=${{ matrix.project }} --reporter=line

- name: Upload Playwright report
if: failure()
uses: actions/upload-artifact@v7
with:
name: playwright-report-${{ matrix.project }}
path: frontend/taskdeck-web/playwright-report
if-no-files-found: ignore

- name: Upload Playwright test results
if: failure()
uses: actions/upload-artifact@v7
with:
name: playwright-test-results-${{ matrix.project }}
path: frontend/taskdeck-web/test-results
if-no-files-found: ignore
2 changes: 1 addition & 1 deletion .github/workflows/reusable-e2e-smoke.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:
CI: "true"
TASKDECK_E2E_DB: taskdeck.e2e.ci.db
TASKDECK_RUN_DEMO: "0"
run: npx playwright test --reporter=line
run: npx playwright test --project=chromium --reporter=line

- name: Upload Playwright report
if: failure()
Expand Down
3 changes: 3 additions & 0 deletions docs/STATUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -882,6 +882,7 @@ Result:
- backend Playwright startup stays on deterministic `Mock` provider mode unless the run is an explicit demo flow that injects live-provider overrides.
- Investigation record remains at `docs/analysis/2026-02-25_frontend-gate-port-bind-and-cors-blockers.md`.
- 2026-03-26 manual audit confirmed the previously published raw API/E2E counts were stale; the next full end-to-end suite recertification should refresh discovery/pass totals rather than continuing to repeat the older 2026-03-06 figures.
- 2026-04-09 cross-browser and mobile E2E matrix delivered (`#87`): Playwright config now defines 5 projects (chromium, firefox, webkit, mobile-chrome/Pixel 7, mobile-safari/iPhone 14); tag-based filtering (`@cross-browser`, `@mobile`, `@quarantine`) controls which tests run per project; 5 cross-browser + 4 mobile viewport tests added; PR gate stays chromium-only; full matrix runs nightly and on `testing` label; flaky test policy documented at `docs/testing/FLAKY_TEST_POLICY.md`

### Demo Director Smoke

Expand Down Expand Up @@ -916,6 +917,7 @@ Extended/non-blocking workflow: `.github/workflows/ci-extended.yml`
- `dependency-review` (PR dependency risk check)
- label/manual-triggered backend solution + E2E smoke lanes (`testing` label or `workflow_dispatch`) for PRs that touch `.github/workflows/**`, `backend/**`, `frontend/**`, `deploy/**`, or `scripts/**`
- label/manual-triggered demo director smoke lane (`automation` label or `workflow_dispatch`) via `.github/workflows/reusable-demo-director-smoke.yml`; docs-only PRs still need manual dispatch because `ci-extended.yml` path filters do not watch `docs/**`
- label/manual-triggered E2E cross-browser matrix lane via `.github/workflows/reusable-e2e-cross-browser.yml` (`testing` label or `workflow_dispatch`); runs all 5 browser/device projects in parallel with `fail-fast: false`
- label/manual-triggered load/concurrency harness lane via `.github/workflows/reusable-load-concurrency-harness.yml`
- label/manual-triggered cross-browser E2E matrix lane via `.github/workflows/reusable-e2e-cross-browser.yml` (5-project parallel matrix: Chromium, Firefox, WebKit, mobile-chrome, mobile-safari)
- label/manual-triggered visual regression lane via `.github/workflows/reusable-visual-regression.yml` (Playwright `toHaveScreenshot()` with diff artifact upload; `testing`/`visual` label)
Expand Down Expand Up @@ -945,6 +947,7 @@ Nightly workflow: `.github/workflows/ci-nightly.yml`

- scheduled/manual backend solution regression
- scheduled/manual E2E smoke (reuses `.github/workflows/reusable-e2e-smoke.yml`)
- scheduled/manual E2E cross-browser matrix (reuses `.github/workflows/reusable-e2e-cross-browser.yml`; 5 projects: chromium, firefox, webkit, mobile-chrome, mobile-safari)
- scheduled/manual load/concurrency harness (reuses `.github/workflows/reusable-load-concurrency-harness.yml`)
- scheduled/manual container image regression

Expand Down
74 changes: 74 additions & 0 deletions docs/TESTING_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -596,6 +596,80 @@ cd frontend/taskdeck-web
npm run test:e2e:audit:headed
```

## Cross-Browser and Mobile E2E Testing

### Browser Projects

The Playwright config defines five projects:

| Project | Device Descriptor | When It Runs |
|---------|------------------|--------------|
| `chromium` | Desktop Chrome | Every PR (ci-required), nightly, manual |
| `firefox` | Desktop Firefox | Nightly, manual dispatch, `testing` label |
| `webkit` | Desktop Safari | Nightly, manual dispatch, `testing` label |
| `mobile-chrome` | Pixel 7 | Nightly, manual dispatch, `testing` label |
| `mobile-safari` | iPhone 14 | Nightly, manual dispatch, `testing` label |

### Test Tagging

Tests use tag annotations in their title strings to control which projects run them:

- **(no tag)** or `@smoke` — runs on chromium only (PR gate default)
- `@cross-browser` — runs on chromium, firefox, and webkit
- `@mobile` — runs on mobile-chrome and mobile-safari only
- `@quarantine` — excluded from all CI (see `docs/testing/FLAKY_TEST_POLICY.md`)

### Running Cross-Browser Tests Locally

Install all browsers (one-time):

```bash
cd frontend/taskdeck-web
npx playwright install --with-deps
```

Run a specific project:

```bash
npx playwright test --project=firefox --reporter=line
npx playwright test --project=mobile-safari --reporter=line
```

Run all projects:

```bash
npx playwright test --reporter=line
```

Run only cross-browser tagged tests across all desktop browsers:

```bash
npx playwright test --grep="@cross-browser" --reporter=line
```

Run only mobile tests:

```bash
npx playwright test --grep="@mobile" --reporter=line
```

### CI Configuration

- **PR gate** (`ci-required.yml`): calls `reusable-e2e-smoke.yml` which installs and runs chromium only. This keeps PR feedback fast (~12 min timeout).
- **Nightly** (`ci-nightly.yml`): calls `reusable-e2e-cross-browser.yml` which runs all 5 projects in a matrix with `fail-fast: false`.
- **Extended/manual** (`ci-extended.yml`): calls `reusable-e2e-cross-browser.yml` on `testing` label or manual dispatch.

### Writing New E2E Tests

1. **Default tests** (no tag): run on chromium in PR gate. Use for most new tests.
2. **Critical journeys** that must work cross-browser: add `@cross-browser` tag. These will also run on chromium in PR gate.
3. **Mobile-specific behavior** (viewport responsiveness, touch targets, overflow): add `@mobile` tag. These only run on mobile projects.
4. **Flaky or unstable tests**: add `@quarantine` tag and file an issue. See `docs/testing/FLAKY_TEST_POLICY.md`.

### Flaky Test Policy

See `docs/testing/FLAKY_TEST_POLICY.md` for the full quarantine/remediation process, SLA timelines, and prevention guidelines.

## Visual Regression Tests

Visual regression tests capture baseline screenshots of key UI surfaces and compare them against future renders to catch unintended layout changes.
Expand Down
125 changes: 125 additions & 0 deletions docs/testing/FLAKY_TEST_POLICY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Flaky Test Policy

Last Updated: 2026-04-09

## Purpose

This document defines how flaky E2E tests are identified, quarantined, and remediated in the Taskdeck test suite. The goal is to maintain CI signal quality: a red build should always mean a real problem.

## Definition

A test is **flaky** when it produces inconsistent pass/fail results across runs without any code change. Common causes:

- Timing-dependent waits or race conditions
- Test isolation failures (shared state between tests or browser profiles)
- Browser-specific rendering timing (especially cross-browser matrix)
- Network/server startup non-determinism

## Tagging Strategy

Taskdeck E2E tests use Playwright tag annotations in test titles:

| Tag | Purpose | Runs in CI |
|-----|---------|------------|
| (no tag) | Default smoke tests | PR gate (chromium only) |
| `@smoke` | Explicit smoke designation | PR gate (chromium only) |
| `@cross-browser` | Critical journeys across all desktop browsers | Nightly + manual (`testing` label) |
| `@mobile` | Mobile viewport responsive tests | Nightly + manual (`testing` label) |
| `@quarantine` | Known flaky, excluded from CI | Never (local debug only) |

### How to tag a test

Add the tag to the test title string:

```typescript
test('@cross-browser board creation workflow', async ({ page }) => {
// ...
})

test('@mobile card editing on small screen', async ({ page }) => {
// ...
})
```

Multiple tags can be combined:

```typescript
test('@cross-browser @mobile responsive navigation', async ({ page }) => {
// ...
})
```

## CI Matrix Strategy

| CI Lane | Trigger | Projects Run | Tag Filter |
|---------|---------|-------------|------------|
| `ci-required.yml` (PR gate) | Every PR/push | chromium only | All tests except `@mobile` |
| `ci-extended.yml` | `testing` label or manual | All 5 projects | Per-project grep (see config) |
| `ci-nightly.yml` | Daily 03:25 UTC | All 5 projects | Per-project grep (see config) |

## Quarantine Process

### Step 1: Identify

When a test fails intermittently (2+ inconsistent results in nightly or PR runs):

1. File a GitHub issue with label `flaky-test` and link the failing test file/line
2. Include failure logs, trace artifacts, and which browser(s) are affected

### Step 2: Quarantine

Add `@quarantine` tag to the test title:

```typescript
test('@quarantine @cross-browser flaky board reload test', async ({ page }) => {
// ...
})
```

The Playwright config excludes `@quarantine` from all CI projects via a top-level `grepInvert` in `playwright.config.ts`. The test still runs locally for debugging (pass `--grep="@quarantine"` explicitly to override).

The top-level exclusion is already configured:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Playwright, grepInvert is a top-level configuration property, not a property within the use block. The documentation should be corrected to reflect the actual configuration structure.

Suggested change
The top-level exclusion is already configured:
To add quarantine exclusion to all projects, add this to playwright.config.ts as a top-level property:


```typescript
// playwright.config.ts (top level)
grepInvert: /@quarantine/,
Comment on lines +79 to +85
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc suggests adding grepInvert in the top-level use block, but grepInvert is a Playwright config option (like grep) and is not part of the use options object. Update the guidance to say to add it at the top level of defineConfig({...}) (or per-project), matching how it’s configured in playwright.config.ts.

Copilot uses AI. Check for mistakes.
```

### Step 3: Investigate

The issue assignee must:

1. Reproduce locally (run the specific test with `--repeat-each=5`)
2. Check for timing issues (missing `waitFor`, race conditions)
3. Check for test isolation issues (shared state, database leaks)
4. Check for browser-specific behavior (compare across projects)

### Step 4: Fix and Un-quarantine

1. Fix the root cause
2. Verify stability: run `npx playwright test --project=<affected> --grep="test name" --repeat-each=10`
3. Remove the `@quarantine` tag
4. Close the issue with a link to the fix PR

## Remediation Timeline

| Severity | SLA | Escalation |
|----------|-----|------------|
| Blocks PR gate (chromium smoke) | Fix within 24 hours or quarantine | Immediate team notification |
| Nightly cross-browser failure | Fix within 1 week | Review in next standup |
| Nightly mobile-only failure | Fix within 2 weeks | Track in sprint backlog |

## Prevention Guidelines

1. **Use explicit waits**: Always `await expect(locator).toBeVisible()` before interacting
2. **Avoid fixed timeouts**: Use `waitForResponse` / `waitForURL` instead of `page.waitForTimeout`
3. **Isolate test state**: Each test gets a fresh user via `registerAndAttachSession`
4. **Use unique names**: Include `Date.now()` in board/card/column names to prevent collisions
5. **Test deterministically**: Avoid tests that depend on animation timing or CSS transitions
6. **Keep browser profiles independent**: Never share cookies, localStorage, or database state across browser projects

## Monitoring

- Nightly CI results are reviewed daily for new failures
- Flaky test issues are prioritized alongside regular bugs
- A test that has been quarantined for more than 30 days without progress should be escalated or removed
Loading
Loading