This is the active testing guide for Taskdeck.
Last Updated: 2026-04-13 Companion Active Docs:
docs/STATUS.mddocs/IMPLEMENTATION_MASTERPLAN.mddocs/TESTING_GUIDE.mddocs/MANUAL_TEST_CHECKLIST.mddocs/GOLDEN_PRINCIPLES.md
- Backend: ~4,479+ passing (estimated after PRs
#821–#826supplementary wave)- Domain: ~833+ (77 prior FsCheck + 93 new property tests for ChatSession/ChatMessage/Notification/KnowledgeDocument/WebhookSubscription + 11 ApiKey + 15 OAuthAuthCode + 8 MfaCredential + NoteImport domain)
- Application: ~1799+ (29 prior JSON fuzz + 19 new chat/notification DTO fuzz + 21 metrics export + 32 forecasting + 22 clarification detector + 7 ChatService clarification + 38 NoteImportService + 25 TelemetryEventService + 21 MfaService + 8 WorkspaceService calendar)
- API integration: ~1135+ (8 metrics export + 80 prior adversarial + 50 new adversarial input + 20 API key + 13 prior concurrency + 22 new concurrency stress + 3 queue resilience + 13 LLM provider resilience + 9 telemetry + 4 telemetry API + 13 OIDC/auth + 9 OAuth token lifecycle)
- CLI contract: 4
- Architecture boundaries: 8
- Frontend unit: ~2,454+ passing (estimated after PRs
#821–#826; ~200+ test files)- New store integration: 88 tests (chat, board, queue, session, notification, workspace)
- New view/component coverage: 107 tests (Archive, Metrics, Board, Review, Chat, CardItem, BoardCanvas, BoardActionRail)
- New resilience: 14 tests (slow API, corrupted storage, loading states)
- Frontend E2E (smoke + automation/ops + capture loop + starter-pack fixtures + concurrency harness + error recovery/multi-board/edge journeys + cross-browser matrix + onboarding/review/capture/keyboard/dark-mode): default required lane passing; +20 new scenarios in PRs
#821–#826 - Combined automated total: ~6,950+ passing (backend ~4,479 + frontend unit ~2,454 + E2E)
Verification note:
- backend total of 4,279 recertified 2026-04-12 via
dotnet test backend/Taskdeck.sln -c Release --list-tests 2>&1 | grep -c "^ "onmainafter merging PRs#800–#820 - frontend total of 2,245 recertified 2026-04-12 via
npx vitest --run --reporter=verbose 2>&1 | grep -c "✓"onmainafter merging PRs#800–#820 - supplementary wave (PRs
#821–#826) adds ~429 new tests; totals estimated pending merge and full-suite recertification - significant test growth in 2026-04-04 wave 1: ChangePassword fix (5 tests), golden-path integration (7), cross-user isolation (38), worker integration (24), controller HTTP (67), proposal lifecycle (74), OAuth/auth edge cases (44), MCP full inventory (42)
- significant test growth in 2026-04-04 wave 2: domain state machines (174), SignalR integration (19), LLM tool-calling edge cases (101), export/import round-trip (64), API error contract (57), archive lifecycle (74), board metrics accuracy (61), notification delivery (36); all 8 PRs received two rounds of adversarial review with 47 review-fix commits addressing false-positive tests, weak assertions, and missing edge cases
- significant test growth in 2026-04-04 wave 3 (PRs
#741–#756, 9 issues): webhook HMAC verification (11 backend tests,#726/#750), webhook SSRF/delivery reliability (78 total webhook tests across 9 files including pre-existing,#710/#756), frontend regression suite expansion (+96 tests:#744+3,#754+4,#745+7,#742+20,#748+route/workspace tests,#743+21) - significant test growth in 2026-04-04 wave 4 (PRs
#765–#770,#776, 7 issues): OAuth token lifecycle integration (19 backend tests,#723/#769), tool argument replay (6 backend tests,#673/#770), streaming chat token usage (4 backend tests,#763/#768), DataExport exception logging (3 backend tests,#759/#766), Agent API 500 fix (2 un-skipped tests,#758/#776), frontend HTTP interceptor + router auth guard tests (33 new tests,#725/#765); all 7 PRs received two rounds of adversarial review with review-fix commits addressing CI failures, performance bugs, resource leaks, misleading test names, and weak assertions - significant test growth in 2026-04-04 wave 5 (PRs
#771–#779, 8 issues, ~258 new tests): tool-calling Phase 3 refinements (17 backend tests,#651/#773), export streaming (15 backend tests,#670/#774), resilience/degraded-mode (34 tests: 18 backend + 16 frontend,#720/#778), frontend view vitest coverage (83 tests across 6 views,#716/#775), Pinia store integration (91 tests across 6 stores,#711/#777), E2E error state expansion (25 Playwright scenarios,#712/#772), accessibility lint (105 warnings → 0,#762/#779), vendored dependency cleanup (#761/#771); all 8 PRs received two rounds of adversarial review
The feature and security expansion wave (PRs #806–#813) added ~231+ new tests across 8 PRs. Each PR received two rounds of adversarial review (self + independent cold review); the independent round caught 9 CRITICAL and 11 HIGH issues — all fixed.
New test categories:
- Calendar endpoint: 8 backend tests covering date range validation, board-access scoping, overdue/blocked status, empty results
- Note import: 38 backend unit tests for markdown section splitting, web clip intake, validation, provenance; 6 frontend API client tests
- Agent surfaces: 42 frontend tests across agentStore (15), AgentsView (8), AgentRunsView (8), AgentRunDetailView (11)
- Telemetry/observability: 25 backend unit tests (opt-in enforcement, event validation, property allowlist) + 13 backend integration tests (DI, endpoints) + 25 frontend tests (consent, store buffering, API)
- OAuth PKCE/account linking: 24+ backend tests covering DB-backed auth codes, atomic consumption, PKCE, account linking conflicts
- SSO/OIDC/MFA: 30+ backend tests covering TOTP validation, email collision, cross-provider isolation, username deduplication, MFA policy, recovery codes
- Staged deployment: smoke test script with 9 automated checks (health, API, auth, frontend, SignalR, static assets, security headers, container restart)
Storybook (non-test tooling): npm run storybook runs 17 Td* primitive stories; npm run storybook:build produces static output.
~429 new tests across 6 PRs. Each PR received two rounds of adversarial review (self-review + independent cold review). Key review findings and fixes:
22 backend tests across 7 files in backend/tests/Taskdeck.Api.Tests/Concurrency/:
- Queue claim races (4): double-claim prevention, stale timestamp, batch processing, two-worker different items
- Card update conflicts (5): concurrent moves, stale-write 409, last-writer-wins, column reorder, concurrent creation
- Proposal approval races (4): double-approve, approve+expire, approve+reject, double-execute
- Webhook delivery concurrency (2), board presence (2), rate limiting (3), cross-user isolation (2)
- Uses
SemaphoreSlimbarriers for true simultaneous execution; SQLite serialization limitations documented
Running:
dotnet test backend/Taskdeck.sln -c Release --filter "FullyQualifiedName~Concurrency"88 frontend tests across 6 files in frontend/taskdeck-web/src/tests/store/:
- chatApi integration (22), boardStore column reorder/conflict (11), queueStore polling (12)
- sessionStore OIDC/SSO (14), notificationStore realtime (15), workspaceStore mode persistence (14)
- Mocks HTTP layer (not API modules) to test full store → API → HTTP chain
20 Playwright scenarios across 5 spec files:
onboarding.spec.ts(5): fresh user empty states, setup dialog, starter pack structurereview-proposals.spec.ts(3): board-scoped filtering, multiple proposals, show completed togglecapture-edge-cases.spec.ts(4): empty/whitespace rejection, Escape dismiss, board-linked capturekeyboard-navigation.spec.ts(4): keyboard board creation, command palette arrows,?help toggledark-mode.spec.ts(4): persistence across views, toggle-off restore, systemprefers-color-scheme
107 tests across 8 files covering previously untested views and components:
- ArchiveView (11), MetricsView (16), BoardView (12), ReviewView (10)
- AutomationChatView (16), CardItem (21), BoardCanvas (12), BoardActionRail (9)
162 tests across 8 files:
- Domain property tests (93): ChatSession, ChatMessage, Notification, KnowledgeDocument, WebhookSubscription
- Application fuzz tests (19): JSON round-trip for chat/notification DTOs with adversarial content
- API adversarial tests (50): raw JSON with float/overflow positions, XSS/injection payloads, unicode blocks, extra unknown fields
30 tests across 3 files:
- LLM provider resilience (13): garbage/empty/429/timeout for OpenAI/Gemini, probe unhealthy
- Queue accumulation resilience (3): accumulation without corruption, rapid concurrent captures
- Frontend slow-API/storage resilience (14): loading states, throttle dedup, corrupted localStorage/token
After batch-merging PRs #800, #805, #811, #813, #815, #819, #820, the following additional test categories are now on main:
34 tests (18 backend + 16 frontend) covering:
- Backend: ChatService LLM provider failure/fallback, worker crash/retry/cancellation/max-retries
- Frontend: store error states, SignalR reconnect polling fallback
30+ backend tests covering TOTP validation, OIDC provider isolation, email collision prevention, username deduplication, MFA policy enforcement, and recovery code lifecycle.
Running MFA/OIDC tests:
dotnet test backend/Taskdeck.sln -c Release --filter "FullyQualifiedName~Mfa"
dotnet test backend/Taskdeck.sln -c Release --filter "FullyQualifiedName~Oidc"63 tests (38 backend + 25 frontend):
- Backend: opt-in enforcement, event property validation against allowlist, value truncation, TelemetryController endpoints
- Frontend: consent management, DNT/GPC detection, store event buffering/flush, analytics script injection
32 backend tests covering ICacheService implementations (InMemory sweep/cap, Redis reconnect/degradation, NoOp pass-through), board list cache-aside with TTL and write-through invalidation.
19+ integration tests covering DB-backed auth code store (valid exchange, expiry, replay prevention, concurrent atomicity, cleanup), JWT lifecycle (expiry, wrong key, garbage token, deactivated user), and SignalR query-string auth.
31 tests (11 domain + 20 integration) covering API key entity (tdsk_ prefix, SHA-256 hashing), ApiKeyMiddleware Bearer validation, HTTP user context mapping, REST key management, and rate limiting per API key.
The platform expansion wave (PRs #796–#805) delivered four new testing capabilities:
Playwright config expanded with 5 projects: chromium (all tests), firefox/webkit (@cross-browser only), mobile-chrome Pixel 7/mobile-safari iPhone 14 (@mobile only). Global @quarantine tag exclusion.
Run commands:
cd frontend/taskdeck-web
npx playwright test --project=chromium # PR gate (default)
npx playwright test --project=firefox # Firefox cross-browser
npx playwright test --grep @mobile # All mobile tests
npx playwright test # Full matrix (nightly)Tagging convention: @smoke (quick CI), @cross-browser (multi-browser), @mobile (viewport), @quarantine (flaky, excluded). See docs/testing/FLAKY_TEST_POLICY.md.
CI: reusable-e2e-cross-browser.yml in nightly + extended (testing label/manual). PR gate stays Chromium-only.
Playwright toHaveScreenshot() with dedicated config: 1280x720 viewport, animations disabled, 0.5% pixel tolerance, light color scheme.
Run commands:
cd frontend/taskdeck-web
npx playwright test --config playwright.visual.config.ts # Run visual tests
npx playwright test --config playwright.visual.config.ts --update-snapshots # Update baselines7 visual tests: board (empty + populated), command palette (open + search), archive, inbox, home. Policy at docs/testing/VISUAL_REGRESSION_POLICY.md.
CI: reusable-visual-regression.yml in extended CI (testing/visual label). Uploads diff artifacts on failure.
Backend (Stryker.NET): targets Taskdeck.Domain with Taskdeck.Domain.Tests. Thresholds: break=60, high=80.
Frontend (Stryker JS): targets captureStore, boardStore, and board/*.ts submodules with vitest runner.
Run commands:
# Backend
cd backend && dotnet tool install dotnet-stryker && dotnet stryker
# Frontend
cd frontend/taskdeck-web && npm run mutation:testCI: mutation-testing.yml runs weekly (Sunday 04:00 UTC) + manual dispatch. Non-blocking, reports uploaded as artifacts. Policy at docs/testing/MUTATION_TESTING_POLICY.md.
New Taskdeck.Integration.Tests project using Testcontainers.PostgreSql for ephemeral database isolation. Each test method gets a fresh PostgreSQL database. Requires Docker.
Run commands:
# Run all (skips gracefully without Docker)
dotnet test backend/tests/Taskdeck.Integration.Tests -c Release
# Run alongside main suite (integration tests auto-skip without Docker)
dotnet test backend/Taskdeck.sln -c Release -m:120 integration tests: Board CRUD, Card operations, Proposal lifecycle, cross-class isolation, parallel execution. Guide at docs/testing/TESTCONTAINERS_GUIDE.md.
CI: reusable-container-integration.yml in extended CI (testing label).
Testing priorities have shifted from "does the harness exist?" toward "does the product remain understandable under change?"
Near-horizon priorities:
- protect the current golden path: capture -> triage -> review -> execute -> board
- keep the deterministic first-run Playwright guardrail aligned to the shipped
Home -> capture -> review -> execute -> boardloop (#328, delivered) - add explicit coverage for action-oriented empty states and board-centered context travel as those surfaces land
- keep stakeholder/demo recording opt-in; it supports product evidence, but it is not the primary product smoke
High-signal additions and delivered guardrails:
Homeview state coverageTodayview state coverage- workspace mode navigation rendering
- proposal summary card coverage
- board action rail coverage
- first-run golden-path Playwright smoke coverage, now delivered as the required regression guardrail in
#328
Telemetry and release-gate follow-through from the expanded blueprint:
- product telemetry/event taxonomy documented in
#341/#741— seedocs/product/TELEMETRY_TAXONOMY.md(taxonomy spec, not shipped instrumentation); reuses#77as baseline;#328provides the delivered first-run guardrail - keep event names privacy-safe and product-shaped using the canonical
noun.verbformat fromdocs/product/TELEMETRY_TAXONOMY.md(for examplecapture.modal_opened,capture.submitted,proposal.approved,proposal.rejected,card.created,board.loaded,auth_session.started,agent_run.completed,agent_run.failed) - treat launch framing as evidence gates, not marketing labels:
R1novice-first beta -> coherentHome -> capture -> review -> execute -> boardpathR2agent foundation alpha -> inspectable runs, policies, and bounded templatesR3knowledge/integrations alpha -> durable searchable context plus supervised connector flows
A dedicated test-coverage wave designed for token-efficient agents (Codex, lightweight LLM runners). Each task is self-contained with pattern files, source paths, and verify commands in docs/codex-tasks/.
Tracked issues: #415 to #429. PRs: #436 to #448. All delivered and merged 2026-03-28 after adversarial review pass with fixes for tautological assertions, missing guard branches, and edge-case gaps.
| Tier | Tasks | Scope | Issues |
|---|---|---|---|
| 1 — Frontend API | labelsApi, columnsApi, usersApi | Mock HTTP, verify URL/payload | #415-#417 |
| 2 — Frontend Composables | useErrorMapper, useEscapeToClose, useShortcutContext | Pure function + lifecycle tests | #418-#420 |
| 3 — Frontend Stores | auditStore, queueStore (real coverage, not demo) | Pinia store with mocked API | #421-#422 |
| 4 — Backend Domain | CardComment, Notification, AutomationProposal, LlmUsageRecord | Entity construction + invariants | #423-#426 |
| 5 — Backend Services | OutboundWebhookSignature (expand), WorkerHeartbeatRegistry, CompositeBoardRealtimeNotifier | Service tests with mocking | #427-#429 |
Remaining coverage gaps (post-wave, now tracked in TST-32 to TST-57 wave #721):
- Frontend: 1 API module untested (captureApi), remaining composables/stores have baseline coverage → tracked in
#711,#716 - Backend: Infrastructure repositories partially covered (7 classes, 77 tests in
#699/#730; remaining repos untested); remaining domain entities untested → tracked in#701; 1 of 5 workers untested → tracked in#700
Tracking issue: #649 (Phase 1 of #647)
New test coverage:
ToolCallingChatOrchestratorTests: multi-turn loop, timeout, max-round enforcementReadToolSchemasTests: schema generation for all 5 read toolsMockLlmProviderToolCallingTests/MockToolCallDispatcherTests/MockToolResultsTests: mock provider tool-calling dispatch and result formattingOpenAiToolCallingParseTests/GeminiToolCallingParseTests: provider-specific tool-call response parsing
Manual validation recommended: send "What cards are in my Backlog?" via chat with Mock provider and verify dynamic tool-calling response.
Tracking issue: #652 (Phase 1 of #648)
New test coverage:
McpBoardResourcesTests:taskdeck://boardsresource listing, phantom-user fallback, multi-user board scoping
Manual validation recommended: configure mcp.example.json in Claude Code / Cursor and ask "What boards do I have?" to verify resource delivery.
Tracking issue: #83
New test coverage:
DataExportServiceTests(10 tests): user-scoped data export completeness, versioned payload shape, cross-user isolationAccountDeletionServiceTests(15 tests): password re-auth, confirmation phrase enforcement, PII anonymization, audit ref cleanup, deactivated-user login rejection
Tracking issue: #77
New test coverage:
BoardMetricsServiceTests(12 backend tests): board-scoped metric aggregation, date range filtering, label groupingmetricsApi.spec.ts(4 frontend tests): API client mock verification
Tracking issue: #539
New test coverage:
authApi.spec.ts(3 tests):getProvidersandexchangeOAuthCodeAPI callssessionStore.spec.ts(2 tests): OAuth code exchange store action
Tracker issue: #721. Seeded from a systematic codebase audit across backend, frontend, and cross-cutting integration boundaries.
Security finding during audit: #722 (SEC-20) — ChangePassword endpoint does not verify caller identity. RESOLVED in #732 (2026-04-04).
22 issues spanning integration tests, edge cases, adversarial inputs, failure modes, and cross-user data isolation. Focus is on integration seams (where services interact) rather than adding more isolated unit tests.
| Priority | Issues | Theme | Status |
|---|---|---|---|
| I | #703 |
Capture → triage → proposal → review → board end-to-end golden path | Delivered (#735) |
| II | #699#700#702#704#705#707#723#725 |
Infrastructure repos, worker, controller gaps, data isolation, concurrency, auth, OAuth, frontend HTTP interceptor | 8 of 8 delivered |
| III | #701#706#708#709#710#711#712#713#714#715#716#718#719#720#726 |
Domain state machines, SignalR, proposal lifecycle, LLM tool-calling, webhooks, frontend stores/views, export/import, error contracts, archive, metrics, notifications, resilience | 15 of 15 delivered |
| IV | #717 |
Property-based and adversarial input tests (extends #89) |
Delivered (#789) |
Wave progress: 25 of 25 issues delivered (plus SEC-20 fix). ~1350+ new tests across six delivery waves. Wave complete. Final deliveries: concurrency stress tests (#705/#793 — 13 tests), property-based adversarial tests (#717/#789 — 211 tests).
Infrastructure repositories: 7 classes now have 77 integration tests (#699/#730); remaining repositories still untested: RESOLVED — 24 integration tests delivered (LlmQueueToProposalWorker#700/#734) covering happy path, error/retry, cancellation, fair-batch, and capture triage pathsCross-user data isolation: RESOLVED — 38 integration tests delivered (#704/#733) covering all major API boundaries; 3 false-positive tests caught and fixed in adversarial reviewFrontend HTTP interceptor and router auth guard: RESOLVED — 33 tests delivered (#725/#765): 19 HTTP interceptor tests + 14 router integration testsGolden path: RESOLVED — 7 integration tests delivered (#703/#735) proving full capture → triage → proposal → review → board pipelineDomain entity state machines: RESOLVED — 174 exhaustive tests delivered (#701/#740) covering CommandRun, ArchiveItem, ChatSession, UserPreference, NotificationPreference, CardLabel, CardCommentMentionSignalR hub integration: RESOLVED — 19 integration tests delivered (#706/#751) covering auth, presence, multi-user, authorization, and edge casesLLM tool-calling edge cases: RESOLVED — 101 tests delivered (#709/#747) for orchestrator, provider abstraction, intent classifier, and tool executor registryExport/import integrity: RESOLVED — 64 round-trip tests delivered (#713/#752) covering JSON, CSV, GDPR, database, and cross-format validationAPI error contract regression: RESOLVED — 57 tests delivered (#714/#753) verifying GP-03 error contract across 7 endpoint familiesArchive lifecycle: RESOLVED — 74 tests delivered (#715/#755): 45 domain state machine + 29 API integration covering cross-user isolation, conflict detection, audit trailBoard metrics accuracy: RESOLVED — 61 tests delivered (#718/#749): 51 service + 10 controller covering throughput, cycle time, WIP, blocked cards, done-column heuristicNotification delivery: RESOLVED — 36 tests delivered (#719/#746) covering all 5 types, deduplication, preference filtering, cross-user isolation, batch operationsWebhook HMAC signature verification: RESOLVED — 11 tests delivered (#726/#750) covering header format, HMAC round-trip, wrong-key rejection, secret rotation, timing-safe comparisonWebhook delivery reliability and SSRF: RESOLVED — 78 webhook tests across 9 files delivered (#710/#756) covering retry/backoff, dead-letter, SSRF boundary conditions (private IPv4/IPv6 ranges viaOutboundWebhookEndpointGuardTests)
Mutation testing is available as a non-blocking quality signal for detecting weak assertions and test gaps.
- Backend: Stryker.NET targeting
Taskdeck.Domain(entity state machines, validation, business rules) - Frontend: Stryker JS targeting
captureStore.ts,boardStore.ts, andboard/*.tssubmodules (core data flow stores)
# Backend (requires dotnet-stryker global tool)
cd backend
dotnet stryker --config-file stryker-config.json
# Frontend
cd frontend/taskdeck-web
npm run mutation:testWeekly workflow (Sunday 04:00 UTC) + manual dispatch via .github/workflows/mutation-testing.yml. Reports uploaded as artifacts.
See docs/testing/MUTATION_TESTING_POLICY.md for threshold strategy, report interpretation, and follow-up process.
- Extends
#254(testing harness improvement wave, delivered) - Extends
#89(property/fuzz pilot, delivered) - Complements
#90(mutation testing pilot) - Complements
#91(Testcontainers for isolation) - Feeds into
#135(integrated multi-component verification program)
First delivery from the rigorous test expansion wave. 77 integration tests across 7 repository classes running against real SQLite (not mocks or in-memory substitutes).
Pattern:
- Each test class creates a fresh SQLite database via
DbContextOptionsBuilder<TaskdeckDbContext>with a unique filename - Tests exercise actual EF Core queries, GUID formatting, ordering, pagination, and filtering against real SQLite behavior
- Database is cleaned up after each test run
Key findings:
- Found and fixed a real
LlmQueueRepositoryordering bug where queue items were not returned in the expected FIFO order - Confirmed correct behavior for raw SQL queries, in-memory pagination edge cases, and GUID string formatting across repositories
Coverage:
- 7 repository classes tested (including
LlmQueueRepository,BoardRepository,CardRepository, and others) - Tests validate query correctness, cross-user isolation, empty-result handling, and ordering guarantees
This establishes the pattern for testing remaining infrastructure repositories tracked in the wave (#721).
Security fix: ChangePassword endpoint now derives userId exclusively from JWT claims instead of accepting client-supplied UserId. 5 new integration tests (unauthenticated 401, own-account success, wrong password, cross-user body-UserId ignored, invalid token).
7 integration tests exercising the full capture → triage → proposal → review → board pipeline against real SQLite with Mock LLM provider:
- Happy path: single capture → proposal → approve → card on board with correct title and column placement
- Multi-operation: 3 checklist items → proposal with 3 operations → 3 cards created atomically
- Rejection: proposal rejected → board remains empty
- Cross-user isolation: User B cannot read/approve/execute User A's proposal
- Audit trail: card creation via proposal recorded in board audit log
- Provenance integrity: full backward-traceable chain (capture → proposal → card) at DB level
- Triage failure: capture without board fails deterministically
38 integration tests proving cross-user isolation across all major API boundaries:
- Boards, columns, cards, captures, proposals, notifications, audit trails, chat sessions, knowledge docs, webhooks, board exports, labels, board access controls
- 3 shared-board tests (grant, scope limitation, revocation)
- Adversarial review caught 3 false-positive tests (LlmQueue never seeded, notifications never created, mark-notification used fabricated GUID) and missing precondition assertions
24 tests for the central background worker (previously zero coverage):
- Happy path, empty queue, transient error retry, max-retry boundary, permanent failure
- Unhandled exceptions, already-claimed items, capture triage paths, disabled processing
- Graceful cancellation,
BuildFairBatchItemslogic, retry backoff, multi-item batch - Adversarial review fixed: fake repository ignoring status transitions, misleading race-condition test, weak interleaving assertions, premature ServiceProvider disposal
67 tests covering 6 previously-untested controllers + 17 new authz regression matrix entries:
- DataPortabilityApiTests (8), AbuseContainmentApiTests (12), MetricsApiTests (7), SearchApiTests (6), AgentProfilesApiTests (10), AgentRunsApiTests (7)
- Discovered 2 pre-existing bugs:
GET /api/agentsandGET /api/agents/{id}/runsreturn 500 - Adversarial review fixed: weak
NotBe(OK)assertions, resource leak, leaked file from another branch
74 tests across domain (42), application (25), and api (7) layers:
- Expiry timing boundaries, double-apply/fail prevention, comprehensive state machine violations
- Batch expiry, worker-vs-manual-approval race, dismissal edge cases, operation mutation guards
- Adversarial review fixed: clock-resolution flakiness (
AddMilliseconds→AddSeconds), string-based Theory refactoring risk, aggressive cancellation timeout; added 5 new edge case tests
44 tests across service (31) and controller (13) layers:
- Login edge cases (blank creds, inactive user, wrong password, concurrent JWT uniqueness)
- Registration edge cases (duplicate email, invalid lengths)
- Token validation (malformed, wrong key, expired, future nbf, wrong issuer/audience, missing sub, deleted/inactive user)
- OAuth code exchange (empty, invalid, replay, expired), open redirect prevention
- Production bug found and fixed:
ExternalLoginAsyncSubstring(0, 50)overflow for short usernames
42 MCP-specific tests for the full inventory:
- 9 resources under
taskdeck://URI scheme - 11 tools (2 read + 6 write + 3 proposal management)
- GP-06 compliance verified: all write tools produce proposals,
approve_proposalexcluded - User-scoping gap found and fixed in adversarial review: proposal resources/tools were not checking
RequestedByUserId
174 tests across 7 entity test classes:
- CommandRun (68 tests): all 6 states × 5 transitions (valid + invalid), constructor validation,
SetOutputPreviewboundary (1000 chars),SetTruncatedidempotency,AddLog, Touch verification - ArchiveItem (41 tests): all 4 states × 4 transitions, constructor validation (entityType, name length 200, Guid.Empty, empty snapshot), round-trip flows
- ChatSession (22 tests): Active/Archived lifecycle, AddMessage blocked on archived, UpdateTitle validation
- UserPreference (18 tests): DismissOnboarding/ReplayOnboarding, RecordOnboardingCompletion once-only guard, UpdateWorkspaceMode
- NotificationPreference (7 tests): constructor validation, Update permutations
- CardLabel (4 tests): join entity construction
- CardCommentMention (6 tests): constructor validation, username length boundary (50 chars)
- Two rounds of adversarial review fixed misleading test name and leftover unused variable
19 integration tests using WebApplicationFactory with SignalR test client:
- Authentication (3): unauthenticated rejection, valid/invalid token
- Presence lifecycle (5): join broadcast, set/clear editing, leave cleanup, abrupt disconnect
- Multi-user (2): multiple users see all members, same-user two-connection aggregation
- Authorization (3): join/leave/editing without board access rejected
- Edge cases (6): board switching, two-tab disconnect, non-existent board, Guid.Empty, timestamps, cross-board isolation
- Adversarial review fixed false-positive auth tests (bare Exception → HttpRequestException+401), silent timeout, resource leak, missing status assertions
101 tests across 4 test classes:
- ToolCallingChatOrchestratorEdgeCaseTests (18): per-round timeout, empty tool calls, concurrent calls, cancellation, metadata, token accumulation, loop detection (added in review)
- LlmProviderAbstractionEdgeCaseTests (24): default CompleteWithToolsAsync throws, MockLlmProvider edge cases, provider selection, kill switch
- LlmIntentClassifierEdgeCaseTests (49): negation filtering, other-tool questions, positive intent, non-actionable, prompt injection, disambiguation, plurals, alternate verbs
- ToolExecutorRegistryEdgeCaseTests (10): empty registry, case-insensitive lookup, duplicate/null registration (added in review)
- Adversarial review fixed false-positive prompt injection test, replaced 30-second slow test, added loop detection and registry edge cases
64 tests across 5 test files:
- BoardJsonExportImport (23): full round-trip, special characters, 100-card scale, empty boards, WIP limits, cross-user isolation, corrupt JSON, duplicate labels
- CsvImport (23): RFC 4180 edge cases, BOM, CRLF, deduplication, 1000-row scale, missing fields, invalid dates
- GdprDataExport (9): valid parseable JSON, empty user, field preservation, cross-user isolation, version/timestamp
- DatabaseExportImport (21): byte-level round-trip, corrupted/truncated rejection, SQLite signature validation, oversized payload
- CrossFormatImport (11): format mismatch detection, binary garbage, wrong JSON shapes
- Adversarial review fixed weak DueDate assertion, brittle JSON substring checks, non-deterministic test branching
57 tests across 7 test files in ErrorContract/ namespace:
- Board (9), Card (10), Column (11), Capture (8), Proposal (7), Label (4), ContentType/Format (7)
- All error assertions through
ApiTestHarness.AssertErrorContractAsyncvalidating GP-03{errorCode, message}shape - Adversarial review fixed 12 weak 404 assertions missing errorCode, 2 false-positive GP-03 tests, non-deterministic unauthenticated test, misleading test name
74 tests across domain (45) and API integration (29):
- Domain (45): all valid/invalid ArchiveItem transitions, full lifecycle sequences, Touch timestamp updates, constructor validation boundaries
- API (29): board/card/column archive-restore cycles, cross-user isolation (3 tests), double-archive/restore handling (409), conflict detection (Rename/Fail strategies), snapshot integrity, audit trail, restore to non-existent/archived boards, filter by type/status/board, auth enforcement
- Adversarial review fixed 2 false-positive tests missing key assertions, 1 missing position check, 2 weak assertions pinned to specific status codes
61 tests across service (51) and controller (10):
- Done column detection (14): named patterns, case-insensitivity, positional fallback, multiple done-like columns
- Throughput (6): card counting, bounce, same-day grouping, non-done exclusion
- Cycle time (8): exact calculation, multi-column paths, averages, in-progress exclusion, zero cycle time
- WIP (4): per-column counts, position ordering, WIP limits
- Blocked cards (5): sort by duration, reasons, unblocked exclusion
- Controller (10): from-after-to validation, label filter, response structure, date range handling
- Adversarial review fixed misleading test name, vacuous sort assertion, silent reflection failure, naming convention
36 integration tests:
- Delivery (5): all 5 notification types (Mention, Assignment, ProposalOutcome, BoardChange, System)
- Deduplication (4): same-key rejection, different-key allowance, no-key duplicates
- Preference filtering (6): type-level enable/disable, in-app channel kill switch, digest-only, BoardChange always-on
- Cross-user isolation (2): notifications scoped to owner, mark-all-read scoped
- Mark as read (4): basic, idempotent, 404, cross-user forbidden
- Batch (3): count returned, board-scoped, zero unread
- Pagination (4): limit enforcement, unread/board filters, invalid limit
- Auth (5): all endpoints reject unauthenticated
- Adversarial review fixed PascalCase typo, 4 weak assertions tightened, overly generous performance threshold
- Production observation noted:
NotificationRepository.GetByUserIdAsyncmaterializes all rows before in-memory pagination (tracked separately)
Run full backend verification (recommended):
dotnet test backend/Taskdeck.sln -c Release -m:1Run project-split backend verification:
dotnet test backend/tests/Taskdeck.Domain.Tests/Taskdeck.Domain.Tests.csproj -c Release
dotnet test backend/tests/Taskdeck.Application.Tests/Taskdeck.Application.Tests.csproj -c Release
dotnet test backend/tests/Taskdeck.Api.Tests/Taskdeck.Api.Tests.csproj -c Release
dotnet test backend/tests/Taskdeck.Cli.Tests/Taskdeck.Cli.Tests.csproj -c Release
dotnet test backend/tests/Taskdeck.Architecture.Tests/Taskdeck.Architecture.Tests.csproj -c ReleaseNote:
- If
Debugruns fail with file-lock errors, stop runningTaskdeck.Apiprocesses or use-c Release. - If backend tests unexpectedly bind to a live LLM provider in local Development, force deterministic mock mode before running the suite:
- PowerShell:
$env:Llm__EnableLiveProviders='false'; $env:Llm__AllowLiveProvidersInDevelopment='false'; $env:Llm__Provider='Mock'; dotnet test backend/Taskdeck.sln -c Release -m:1
- PowerShell:
Run container-backed integration tests against ephemeral PostgreSQL (requires Docker):
dotnet test backend/tests/Taskdeck.Integration.Tests/Taskdeck.Integration.Tests.csproj -c ReleaseRun a specific test class:
dotnet test backend/tests/Taskdeck.Integration.Tests/Taskdeck.Integration.Tests.csproj -c Release --filter "FullyQualifiedName~BoardCrudIntegrationTests"Note:
- Docker must be running. Verify with
docker info. - First run downloads the
postgres:16-alpineimage (~80MB); subsequent runs use the cached image. - Tests are parallel-safe: each test class gets its own isolated database within a shared PostgreSQL container.
- See
docs/testing/TESTCONTAINERS_GUIDE.mdfor full setup and authoring guide.
cd frontend/taskdeck-web
npm run lint
npm run test:coverage
npm run typecheck
npm run buildFrontend lint suppression guidance:
- Prefer fixing lint violations over suppressing them.
- Keep suppressions as narrow as possible (
eslint-disable-next-linewith reason). - Avoid file-wide disables unless absolutely required and documented with a follow-up issue.
Frontend coverage threshold policy:
- Coverage thresholds are enforced via
frontend/taskdeck-web/vitest.config.tsand are part of the required CI gate. - Global thresholds protect against broad regressions; per-surface thresholds protect high-signal areas (
src/api,src/store,src/composables,src/utils,src/components/board). - Ratchet rule: thresholds may stay flat or increase, but must not decrease.
- Threshold breach behavior can be validated locally with an override command, for example:
cd frontend/taskdeck-web && npx vitest run --coverage --coverage.thresholds.lines=99 --coverage.thresholds.statements=99 --coverage.thresholds.functions=99 --coverage.thresholds.branches=99
Frontend local dev server (manual workflows):
cd frontend/taskdeck-web
npm run devNotes:
npm run devnow auto-resolves frontend port with fallback order5173->4173->5001when a port is restricted or unavailable.- launcher now selects a bindable port first; occupied candidate ports (including existing Taskdeck listeners) are skipped for new Vite processes.
- launcher now applies strict-port startup semantics by default to avoid Vite auto-increment drift.
- explicit overrides remain supported (for example
npm run dev -- --host localhost --port 5001orTASKDECK_DEV_PORT=5001 npm run dev). - backend Development CORS defaults include localhost fallback ports (
4173,5001) so login/API calls stay aligned when fallback startup is used.
Install browser once:
cd frontend/taskdeck-web
npx playwright install chromiumRun E2E suite:
cd frontend/taskdeck-web
npx playwright test --reporter=lineFallback (force an alternate frontend port):
PowerShell:
cd frontend/taskdeck-web
$env:TASKDECK_E2E_FRONTEND_PORT='5001'
$env:TASKDECK_E2E_API_CORS_ORIGINS='http://localhost:5001'
npx playwright test --reporter=lineBash:
cd frontend/taskdeck-web
TASKDECK_E2E_FRONTEND_PORT=5001 TASKDECK_E2E_API_CORS_ORIGINS='http://localhost:5001' npx playwright test --reporter=lineOptional E2E env overrides (Playwright config):
TASKDECK_E2E_FRONTEND_HOST(defaultlocalhost)TASKDECK_E2E_FRONTEND_PORT(when unset, config auto-probes5173, then4173, then5001)TASKDECK_E2E_FRONTEND_BASE_URL(defaulthttp://{host}:{port}; must behttp://with explicit port and no path/query/hash)TASKDECK_E2E_API_BASE_URL(defaulthttp://localhost:5000/api; must behttp://with explicit port and API path)TASKDECK_E2E_API_CORS_ORIGINS(comma-separated additional origins merged with defaults: frontend origin plushttp://localhost:5174; each value is passed to backend process asCors__DevelopmentAllowedOrigins__{index})TASKDECK_E2E_REUSE_EXISTING_SERVER(defaults totruelocally andfalsein CI; full demo runs that inject live-provider backend overrides also switch reuse off by default so the intended backend process is actually launched; set0to force fresh backend/frontend startup or1to force reuse intentionally)
Override behavior notes:
- backend Playwright
webServerreadiness URL is derived fromTASKDECK_E2E_API_BASE_URLas{apiBaseUrl}/boards - backend Playwright process startup binds to the same API origin via
ASPNETCORE_URLS - backend Playwright startup now forces deterministic mock-provider mode by default; live-provider env is only injected for explicit demo runs (
TASKDECK_RUN_DEMO=1/ director path) when LLM steps are enabled
Troubleshooting note (Windows local environments):
- if Playwright startup fails with
listen EACCESfor the frontend port, keepTASKDECK_E2E_FRONTEND_PORTunset so auto-fallback can select the next bindable port. - when auto-fallback is used, Playwright keeps runner/worker aligned by storing the first resolved fallback port in-process (
TASKDECK_E2E_RESOLVED_FRONTEND_PORT) so worker-side config evaluation does not drift to a different fallback port after the frontend webServer starts. - local reuse mode prefers identity-verified listeners; CI mode prefers bindable ports for first resolution.
- if you explicitly set
TASKDECK_E2E_FRONTEND_PORT, useTASKDECK_E2E_API_CORS_ORIGINSwhen needed so API preflight requests stay aligned with the chosen frontend origin. - investigation details and reproduction commands are documented in
docs/analysis/2026-02-25_frontend-gate-port-bind-and-cors-blockers.md.
Run concurrency harness spec only:
cd frontend/taskdeck-web
npm run test:e2e:concurrencyOpt-in live-provider check (headed-friendly):
PowerShell:
cd frontend/taskdeck-web
$env:TASKDECK_RUN_LIVE_LLM_TESTS='1'
npx playwright test tests/e2e/live-llm.spec.ts --headed --reporter=lineHeaded manual-audit pack:
cd frontend/taskdeck-web
npm run test:e2e:audit:headedThe Playwright config defines five projects:
| Project | Device Descriptor | When It Runs |
|---|---|---|
chromium |
Desktop Chrome | Every PR (ci-required), nightly, manual |
firefox |
Desktop Firefox | Nightly, manual dispatch, testing label |
webkit |
Desktop Safari | Nightly, manual dispatch, testing label |
mobile-chrome |
Pixel 7 | Nightly, manual dispatch, testing label |
mobile-safari |
iPhone 14 | Nightly, manual dispatch, testing label |
Tests use tag annotations in their title strings to control which projects run them:
- (no tag) or
@smoke— runs on chromium only (PR gate default) @cross-browser— runs on chromium, firefox, and webkit@mobile— runs on mobile-chrome and mobile-safari only@quarantine— excluded from all CI (seedocs/testing/FLAKY_TEST_POLICY.md)
Install all browsers (one-time):
cd frontend/taskdeck-web
npx playwright install --with-depsRun a specific project:
npx playwright test --project=firefox --reporter=line
npx playwright test --project=mobile-safari --reporter=lineRun all projects:
npx playwright test --reporter=lineRun only cross-browser tagged tests across all desktop browsers:
npx playwright test --grep="@cross-browser" --reporter=lineRun only mobile tests:
npx playwright test --grep="@mobile" --reporter=line- PR gate (
ci-required.yml): callsreusable-e2e-smoke.ymlwhich installs and runs chromium only. This keeps PR feedback fast (~12 min timeout). - Nightly (
ci-nightly.yml): callsreusable-e2e-cross-browser.ymlwhich runs all 5 projects in a matrix withfail-fast: false. - Extended/manual (
ci-extended.yml): callsreusable-e2e-cross-browser.ymlontestinglabel or manual dispatch.
- Default tests (no tag): run on chromium in PR gate. Use for most new tests.
- Critical journeys that must work cross-browser: add
@cross-browsertag. These will also run on chromium in PR gate. - Mobile-specific behavior (viewport responsiveness, touch targets, overflow): add
@mobiletag. These only run on mobile projects. - Flaky or unstable tests: add
@quarantinetag and file an issue. Seedocs/testing/FLAKY_TEST_POLICY.md.
See docs/testing/FLAKY_TEST_POLICY.md for the full quarantine/remediation process, SLA timelines, and prevention guidelines.
Visual regression tests capture baseline screenshots of key UI surfaces and compare them against future renders to catch unintended layout changes.
Policy document: docs/testing/VISUAL_REGRESSION_POLICY.md (thresholds, false-positive mitigation, baseline management)
Test location: frontend/taskdeck-web/tests/visual/
Config: frontend/taskdeck-web/playwright.visual.config.ts
Covered surfaces: board view (empty + populated), command palette (open + search), archive view, inbox/capture view, home view
Run visual tests:
cd frontend/taskdeck-web
npm run test:visualUpdate baselines after intentional UI changes:
cd frontend/taskdeck-web
npm run test:visual:updateKey settings: fixed viewport 1280x720, animations disabled, 0.5% pixel tolerance, platform-specific baselines (CI canonical platform: ubuntu-latest).
CI integration: runs in CI Extended pipeline with testing or visual PR labels. Diff artifacts uploaded on failure for review.
Default CI posture:
- Required Playwright regression lanes explicitly set
TASKDECK_RUN_DEMO=0; the stakeholder recorder is never part of required CI. - Load/concurrency Playwright coverage also keeps demo recording off by default so those lanes stay focused on product/runtime regressions.
- The deterministic demo regression command is
npm run demo:director:smoke. - Demo tooling remains supporting evidence for seeded workflows; it does not replace the required product smoke path.
Run the smoke path locally:
cd frontend/taskdeck-web
npm run demo:director:smokePolicy notes:
demo:director:smokerunsengineering-sprintwith--skip-llm, zero autopilot turns, a fixed RNG seed, a stable artifact directory (demo-artifacts/ci-smoke), an isolated smoke DB (taskdeck.demo.ci.db), and fresh backend/frontend startup.- when fresh-server mode cannot bind
http://localhost:5000/api, the director automatically selects a free local API port; if explicit overrides still conflict, it prints a remediation hint forTASKDECK_E2E_API_BASE_URL/TASKDECK_E2E_FRONTEND_PORT. ci-extended.ymlexposes a matchingdemo-director-smokelane for explicit validation throughworkflow_dispatchor a PR labeledautomationwhen the PR touches.github/workflows/**,backend/**,frontend/**,deploy/**, orscripts/**.npm run demo:seedis expected to be rerun-safe on the canonical demo account: seeded captures, queue examples, chat evidence, comments, and Ops logs should be reused when present instead of multiplying on every local/manual regression run.demo:directorvalidates its own options before Playwright passthrough; keep director flags before--and pass raw Playwright arguments only after--.- Full stakeholder walkthrough recording remains manual/headed via
TASKDECK_RUN_DEMO=1. - opt-in live-provider chat verification is now separate from demo mode: use
TASKDECK_RUN_LIVE_LLM_TESTS=1when you want a real-provider probe without running the full stakeholder demo flow.
Canonical operator contract:
docs/product/SAUL_DEMO_REHEARSAL_CONTRACT.md
Deterministic bootstrap for the Saul-facing story:
cd frontend/taskdeck-web
npm run demo:seed
npm run demo:run -- --clean --skip-llm client-onboardingDeterministic artifact rehearsal bundle:
cd frontend/taskdeck-web
npm run demo:director -- --output-dir ./demo-artifacts/saul-rehearsal --e2e-db ./taskdeck.demo.saul.db --reset-e2e-db --fresh-servers --scenario client-onboarding --skip-llm --turns 0 --rng-seed saul-rehearsalAcceptance focus for this rehearsal:
- prove
Home -> Inbox/Capture -> Review -> Board - prove review-first trust language is visible without narration
- prove ACME onboarding capture becomes clean board work after explicit approval
Run local k6 board-heavy profile (backend API must be reachable at K6_BASE_URL):
docker run --rm --network host \
-e K6_BASE_URL=http://127.0.0.1:5000/api \
-e K6_VUS=20 \
-e K6_DURATION=90s \
-e K6_USER_POOL=6 \
-v "$PWD:/work" \
-w /work \
grafana/k6:0.49.0 \
run tests/load/k6/board-heavy-load.js \
--summary-export frontend/taskdeck-web/test-results/load/k6-summary.jsonNotes:
- tune
K6_VUS,K6_DURATION, andK6_USER_POOLper machine capacity. - script thresholds fail on sustained latency/error budget breaches and emit actionable status/body diagnostics.
TASKDECK_JWT_SECRET=local-test-secret docker compose -f deploy/docker-compose.yml --profile baseline config
docker build -f deploy/docker/backend.Dockerfile -t taskdeck-api:local .
docker build --build-arg VITE_API_BASE_URL=/api -f deploy/docker/frontend.Dockerfile -t taskdeck-web:local .Deployment script smoke path (PowerShell):
powershell -File ./scripts/deploy/Start-TaskdeckStack.ps1
powershell -File ./scripts/deploy/Smoke-TestTaskdeckStack.ps1 -Port 8080 # if TASKDECK_PROXY_PORT differs, set -Port to match
powershell -File ./scripts/deploy/Stop-TaskdeckStack.ps1Deployment hardening matrix automation (PowerShell):
powershell -File ./scripts/deploy/Verify-TaskdeckDeploymentHardening.ps1 -Port 8080Hardening matrix pass/fail criteria:
docs/ops/DEPLOYMENT_HARDENING_MATRIX.md
Repeatable failure-injection scenarios for deployment and MCP workflows:
bash scripts/drills/run-all-drills.sh # local run
bash scripts/drills/run-all-drills.sh --ci # CI-compatible with machine-readable outputScenarios covered:
- Missing SQLite database at startup
- Locked SQLite database at startup
- Readiness-check timeout behavior
- MCP configuration validation / unknown-server handling
- Reverse-proxy misconfiguration regression
Drill documentation and recovery paths: docs/ops/FAILURE_INJECTION_DRILLS.md
Static validation (no cloud apply required):
terraform fmt -check -recursive deploy/terraform/aws
powershell -File ./scripts/deploy/Test-TaskdeckTerraformBaseline.ps1Real-environment drift check (requires environment-specific terraform.tfvars, backend config, and AWS credentials):
powershell -File ./scripts/deploy/Invoke-TaskdeckTerraformDriftCheck.ps1 `
-Environment staging `
-VarFile deploy/terraform/aws/environments/staging/terraform.tfvars `
-BackendConfigFile deploy/terraform/aws/environments/staging/backend.hcl `
-RefreshOnlyNotes:
Test-TaskdeckTerraformBaseline.ps1runsterraform init -backend=falseandterraform validatefordev,staging, andprod.Invoke-TaskdeckTerraformDriftCheck.ps1relies onterraform plan -detailed-exitcode;0means no changes,2means drift for-RefreshOnlyor planned changes for a non-refresh-only run, and any other exit is a failure.- The Terraform baseline intentionally provisions the current single-node Docker deployment model; the JWT signing secret comes from a pre-created SecureString SSM parameter, and the SQLite path lives on a dedicated persistent EBS data volume so routine host replacement does not discard
/var/lib/taskdeck/taskdeck.db. stagingandproddefaultprotect_data_volumetotrue; intentional destroys or migrations that must remove the data volume require a reviewed switch to the unprotected path plus a reviewed module-source change to relax/removeprevent_destroybefore the destructive apply.- Changing an existing environment from
protect_data_volume = falsetotruealso replaces the underlying EBS volume with a new protected one; treat that as a destructive migration and capture a backup or snapshot first. - Staged rollout policy, managed DB, and full secret-rotation posture remain tracked in
#101,#84, and#110.
docker mcp server ls
powershell -File ./scripts/mcp/Test-DockerMcpProfile.ps1Optional servers (postman, dockerhub) warning mode:
powershell -File ./scripts/mcp/Test-DockerMcpProfile.ps1 -IncludeOptionalOptional servers strict mode (fail-fast on missing prereqs/runtime failures):
powershell -File ./scripts/mcp/Test-DockerMcpProfile.ps1 -IncludeOptional -FailOnOptionalErrorsCI-friendly variants:
powershell -File ./scripts/mcp/Test-DockerMcpProfile.ps1 -CiMode
powershell -File ./scripts/mcp/Test-DockerMcpProfile.ps1 -IncludeOptional -SkipOptionalWhenMissingPrereqs -CiMode
powershell -File ./scripts/mcp/Test-DockerMcpProfile.ps1 -IncludeOptional -FailOnOptionalErrors -CiModeRequired workflow: .github/workflows/ci-required.yml
docs-governance- Enforces required active docs and docs index invariants
backend-architecture- Enforces architecture boundaries in CI
backend-unit- Domain + Application + CLI contract tests
- Ubuntu and Windows matrix
api-integration- API integration tests
- Ubuntu and Windows matrix
frontend-unit- Lint + coverage-threshold Vitest + typecheck + build
- Ubuntu and Windows matrix
- Uploads JUnit + coverage artifacts (
test-results/,coverage/) for triage
container-images- Validates compose rendering
- Builds backend/frontend container images
- Exports compressed image artifacts plus SHA256 checksums
e2e-smoke- Playwright smoke + automation/ops + fixture bootstrap flow
- Ubuntu only
- Depends on all prior gates
Extended workflow: .github/workflows/ci-extended.yml
workflow-lint- Actionlint validation for
.github/workflows/**drift
- Actionlint validation for
dependency-review- PR dependency change risk signal (
actions/dependency-review-action)
- PR dependency change risk signal (
backend-solution+e2e-smoke+load-concurrency-harness- opt-in on PRs labeled
testingor manualworkflow_dispatch(runs Playwright smoke suite viareusable-e2e-smoke.yml) - load harness lane runs k6 board-heavy profile plus Playwright multi-session concurrency spec via
reusable-load-concurrency-harness.yml
- opt-in on PRs labeled
demo-director-smoke- opt-in on PRs labeled
automationor manualworkflow_dispatch; PR-triggered runs still require watched-path changes becauseci-extended.ymldoes not includedocs/** - runs the deterministic
demo:director:smokepath viareusable-demo-director-smoke.yml
- opt-in on PRs labeled
Nightly workflow: .github/workflows/ci-nightly.yml
- scheduled/manual backend solution regression (
dotnet test backend/Taskdeck.sln -c Release -m:1) - scheduled/manual E2E smoke suite (
reusable-e2e-smoke.yml) - scheduled/manual load-concurrency harness (
reusable-load-concurrency-harness.yml) - scheduled/manual container image regression
developer-portal: builds API, fetches/swagger/v1/swagger.json, runs@redocly/cli build-docs, uploadsartifacts/developer-portal/including docs fromdocs/api/(PR #658)
Nightly quality workflow: .github/workflows/nightly-quality.yml
- scheduled/manual reporting lane for quality telemetry (non-blocking for required PR CI checks)
- backend coverage artifacts:
- Domain coverage (
Taskdeck.Domain.Testswith XPlat Code Coverage) - Application coverage (
Taskdeck.Application.Testswith XPlat Code Coverage)
- Domain coverage (
- frontend coverage artifacts:
npm run test:coverageoutput (coverage/+test-results/)
- dependency/security signal artifacts:
dotnet list package --vulnerable --include-transitiveoutput + exit codenpm audit --audit-level=high --jsonoutput + exit code- normalized dependency-security summary (
summary.md,summary.json) linked todocs/security/SECURITY_DEPENDENCY_VULNERABILITY_POLICY.md
Triage usage:
- check workflow step summary first for signal exit codes
- inspect uploaded artifacts to differentiate command failures from dependency findings
- treat this lane as reporting-first; promote to stricter gating only through a dedicated follow-up issue/decision
Release/security workflow: .github/workflows/release-security.yml
- release/tag/manual dependency inventory artifact generation
- backend/frontend vulnerability signal capture
- manual strict-enforcement option that fails on unresolved high/critical findings, non-zero dependency scan exits, or unparseable scan outputs
- reusable container artifact/checksum lane for release-ready outputs
CI extended dependency-security lane:
.github/workflows/ci-extended.ymlnow exposes an opt-inDependency Security Signalsjob through manual dispatch or PRs labeledsecurity- this lane is reporting-first and uses the same normalized summary format as nightly/release flows
Tracking issues:
- wave tracker:
#254 - delivered execution:
#255to#260
Already-covered pack scenarios (no duplicate implementation issue required):
- WIP limit enforcement already covered across application/API/E2E.
- sandbox-gated database import/export rejection outside Development already covered.
- starter-pack idempotency/conflict safety already covered.
Knowledge transfer applied to existing seeds:
#89: targeted property/fuzz pilot surfaces (manifest/query/import-export boundaries)#90: non-blocking scheduled mutation-lane posture#106: dependency/security signal command baseline (dotnet list package --vulnerable,npm audit)#168: CI topology routing for OpenAPI/nightly-quality lanes
Delivered outcomes:
#255removed residual wall-clock flake vectors and centralized reusable E2E polling helpers#256locked drag/drop persistence after full reload into Playwright smoke coverage#257centralized representative400/401/403/404/409API error-contract assertions#258added OpenAPI generation + parse-validation artifacts in CI#259codifieddocs/GOLDEN_PRINCIPLES.mdwith lightweight mechanical enforcement#260added the non-blocking nightly-quality workflow for coverage and dependency/security signal artifacts
Useful local checks for this wave:
rg -n "Thread\\.Sleep|new Promise\\(.*setTimeout" backend/tests frontend/taskdeck-web/tests/e2e
dotnet test backend/tests/Taskdeck.Api.Tests/Taskdeck.Api.Tests.csproj -c Release --filter "FullyQualifiedName~ApiErrorContractApiTests"
(cd frontend/taskdeck-web && npx playwright test tests/e2e/smoke.spec.ts tests/e2e/automation-ops.spec.ts tests/e2e/capture-loop.spec.ts --reporter=line)
node scripts/check-golden-principles.mjs
node scripts/check-docs-governance.mjsOpenAPI guardrail local checks (#258):
./scripts/ci/generate-openapi-artifact.ps1 -OutputPath "artifacts/openapi/taskdeck-api.json"
./scripts/ci/validate-openapi.ps1 -SpecPath "artifacts/openapi/taskdeck-api.json"Malformed-output simulation (deterministic parse failure check):
"not-json" | Set-Content -Path artifacts/openapi/invalid-openapi.json
./scripts/ci/validate-openapi.ps1 -SpecPath "artifacts/openapi/invalid-openapi.json"Follow-up intentionally deferred from this issue:
- snapshot/diff enforcement against a checked-in OpenAPI baseline remains a future enhancement
- current guardrail scope is generation + parse/shape validation + CI artifact publication
Tracking issues:
- wave tracker:
#262 - deferred execution:
#263to#268
Reuse links (no duplicate implementation issue):
#75delivered import-adapter foundation for outreach CSV mapping/dedupe profile#77analytics model/dashboards for future outreach scoreboard metrics#175first-party starter-pack catalog expansion for outreach blueprint inclusion
Planned quality expectations when implementation starts:
- YAML front-matter parser round-trip stability tests (contact fields + timeline preservation)
- cadence scheduling determinism + throughput-control guardrail tests
- API/UX regression for contact logging and dashboard action loops
- E2E coverage for outreach loop: import/apply -> contact update -> cadence proposal -> dashboard action flow
- Domain invariants:
backend/tests/Taskdeck.Domain.Tests
- Application services:
backend/tests/Taskdeck.Application.Tests- Includes board/card/column/label/auth/authorization/board-access/export-import/history/queue plus automation/archive/chat/ops/log services
- Includes database export/import guardrail coverage (sandbox gating, payload validation, file replacement)
- Includes external import-adapter parsing and board upsert orchestration coverage (CSV/outreach profile, dedupe policy, rollback safety path)
- Includes starter-pack manifest parsing/validation, first-party catalog validity, and apply-planning coverage
- Includes LLM tool-calling orchestrator coverage (multi-turn loop, timeout, round limits) and read tool schema generation
- Includes GDPR data export service (user-scoped completeness, versioned payload) and account deletion service (re-auth, confirmation phrase, PII anonymization)
- Includes board metrics service coverage (aggregation, date range, label grouping)
- Includes MCP board resource coverage (listing, phantom-user fallback, multi-user scoping)
- HTTP contracts and behavior mappings:
backend/tests/Taskdeck.Api.Tests- Includes core + automation/archive/chat/ops/log/health controllers
- Includes rate-limit policy coverage (
RateLimitingApiTests) for burst throttling, retry metadata contract, reset-window recovery, and cross-user boundary behavior - Includes security-header baseline coverage (
SecurityHeadersApiTests) for success/auth-failure paths and HTTPS HSTS posture assertions - Includes board-scoped external import endpoint coverage (authz, malformed input, duplicate handling, apply/update flow, rollback safety)
- Includes outbound webhook API and worker coverage (
OutboundWebhooksApiTests,OutboundWebhookDeliveryWorkerTests) for claim/reload handling, cancellation requeue, and non-success HTTP retry/dead-letter branches - Includes
ResultExtensionsmapping tests for standardized API error/status behavior
- CLI contracts:
backend/tests/Taskdeck.Cli.Tests
- Architecture boundaries:
backend/tests/Taskdeck.Architecture.Tests- Enforces project-reference boundaries between Domain/Application/Infrastructure/API projects
- Enforces source-layer purity via forbidden namespace imports in Domain and Application source trees
- Enforces API controller boundary invariants:
- only
AuthControllerandHealthControllermay inheritControllerBasedirectly - protected controllers must declare
[Authorize]
- only
- Failure remediation:
- move forbidden dependencies to the correct layer abstraction/interface
- route protected HTTP surface through
AuthenticatedControllerBase - add/restore
[Authorize]on protected controller classes
- Frontend unit behavior:
frontend/taskdeck-web/src/tests- Components, stores, API modules, composables, utilities
- Includes shared utility tests for
queryBuilderanderrorMessage - Includes GitHub OAuth API client and session store coverage (
authApi,sessionStore) - Includes board metrics API client and store coverage (
metricsApi,metricsStore)
- End-to-end journeys:
frontend/taskdeck-web/tests/e2e- Includes deterministic starter-pack fixture bootstrap coverage for
small,medium, andedgemanifest scenarios - Includes unauthenticated SignalR negotiate rejection coverage aligned with the runtime client handshake path
- Includes dedicated multi-session concurrency regression coverage (
tests/e2e/concurrency.spec.ts)
- Load and concurrency API profile:
tests/load/k6/board-heavy-load.js- Includes seeded-user board-heavy read/write load mix and threshold-based regression diagnostics
Use docs/MANUAL_TEST_CHECKLIST.md for action-by-action manual validation.
Use docs/ops/OBSERVABILITY_BASELINE.md for telemetry dashboard/alert baseline and observability smoke validation.
Detailed step-indexed validation checklists:
- Slice A — workspace shell, board lifecycle, keyboard UX:
docs/testing/manual-validation-a-workspace-board-ux.md - Slice B — authz policy, cross-user isolation, API error contracts:
docs/testing/manual-validation-b-authz-contracts.md
This section defines validation expectations for the capture-first direction.
Current state:
- capture MVP loop is shipped end-to-end (
#200to#211) - capture loop assertions below are required baseline checks for regression safety
Required assertions:
- capture action is fast and deterministic (target under 10 seconds to persisted artifact in normal local conditions)
- triage path stays proposal-first (no direct board mutation from model output)
- provenance links are visible from proposal/card surfaces back to capture source
- error and auth contracts remain stable (
ApiErrorResponse,401/403/404policy)
Recommended execution pairing:
- automated: API + frontend unit + E2E capture loop (
#210delivered, retained as active regression path) - manual: capture friction/trust checks in
docs/MANUAL_TEST_CHECKLIST.md
Manual incident rehearsals complement automated tests by validating diagnosis and recovery workflows against realistic failure conditions. Rehearsals are scheduled monthly (lightweight, ~30 min) and quarterly (deep drill, ~2 hours).
Key resources:
docs/ops/INCIDENT_REHEARSAL_CADENCE.md-- schedule, rotation, and processdocs/ops/rehearsal-scenarios/-- scenario templates (health degradation, telemetry gaps, deployment failures)docs/ops/EVIDENCE_TEMPLATE.md-- evidence package formatdocs/ops/REHEARSAL_BACKOFF_RULES.md-- how rehearsal findings become tracked issuesdocs/ops/rehearsals/-- completed rehearsal evidence packages
Rehearsals are distinct from the automated failure-injection drill suite (docs/ops/FAILURE_INJECTION_DRILLS.md). Drills are scripted and CI-runnable; rehearsals are human-driven and focus on diagnosis speed, tooling gaps, and recovery muscle memory.
For local development only, authorization bypass can be enabled via:
backend/src/Taskdeck.Api/appsettings.Development.jsonDevelopmentSandbox.Enabled = true
Safety boundary:
- Sandbox bypass is forced off outside Development environment.
- Validation and data integrity rules still apply.
Tracking issue: #726
New test coverage:
OutboundWebhookHmacDeliveryTests(11 tests): header format verification (sha256=<64-hex>), HMAC round-trip receiver recompute and match, wrong-key rejection, secret rotation produces different signature, body/content-type matching, large payload (100 kB), timing-safe comparison viaCryptographicOperations.FixedTimeEquals, determinism, key-differ properties
Key adversarial review findings fixed: secret rotation test was testing different subscriptions (not actual rotation on same subscription); BCL-testing assertions replaced with real domain property tests.
Tracking issue: #710
New test coverage across webhook test suite (78 tests total across 9 files):
OutboundWebhookEndpointGuardTests(Application.Tests): SSRF guard cases covering private IPv4 ranges and endpoint validationOutboundWebhookServiceTests(Application.Tests, 19 tests): service-level webhook subscription and delivery orchestrationOutboundWebhookSignatureTests(Application.Tests, 8 tests): HMAC signature computation and verificationOutboundWebhookDeliveryWorkerTests(Api.Tests, 8 tests): worker-level delivery scheduling and retry logicOutboundWebhookHmacDeliveryTests(Api.Tests, 11 tests): end-to-end HMAC delivery including header format, round-trip, wrong-key rejectionOutboundWebhooksApiTests(Api.Tests, 10 tests): API endpoint contract for webhook subscription managementOutboundWebhookDeliveryRepositoryTests(Api.Tests, 3 tests): repository-level delivery persistenceOutboundWebhookDeliveryTests(Domain.Tests, 8 tests): domain entity state and transitionsOutboundWebhookSubscriptionTests(Domain.Tests, 7 tests): subscription domain entity
Key adversarial review fix: HttpClient resource leaks across 9 test methods.
Manual validation recommended: configure a webhook endpoint with a known secret and verify that (a) the X-Taskdeck-Webhook-Signature header (alongside X-Taskdeck-Webhook-Timestamp) is present and verifiable with HMAC-SHA256, and (b) a webhook targeting http://localhost/ or http://10.0.0.1/ is rejected at the SSRF guard.
Tracking issues: #683, #680, #685, #686, #687, #688
New test files:
boardStore.wipLimit.spec.ts(7 tests): WIP-limit toast deduplication regression forcreateCardandmoveCard; guards against future double-toast introductionsessionStore.authToast.spec.ts(20 tests): auth-flow toast lifecycle — login/register/OAuth failure and success toasts, cross-flow isolation, auto-removal independence; uses realtoastStorebacked by fresh Piniarouter/authGuard.spec.ts(new): auth guard decision table — unauthenticated redirect, expired-token cleanup, authenticated pass-through, deflection from /login when authenticated, demo mode, 12-route exhaustive tablerouter/workspaceRouteStability.spec.ts(new): workspace mode persistence across simulated reloads, hydration drift prevention,resetForLogoutcleanupInboxView.spec.ts(+21 tests): single-item triage action states (per status variant), bulk action bar visibility and count, batchBusy disabled state, select-all behavior; all assertions on DOM state
Frontend suite total after this wave: 1592 passing (up from 1496 pre-wave).
Tracking issues: #78, #79, #249, #576, #654, #705, #717
New test coverage (~390+ new tests total):
MetricsExportServiceTests.cs(21 unit tests + 5 adversarial-review injection tests): CSV structure validation, all 5 sections, CSV injection prevention vectors including embedded newlinesMetricsExportApiTests.cs(8 integration tests): auth, cross-user isolation, empty board, date range, Content-Disposition headersForecastingServiceTests.cs(32 tests): validation, authorization, edge cases (zero throughput, no done column, single data point, large card counts, bounce deduplication, history-window-vs-span)ApiKeydomain tests (11 tests): entity construction, SHA-256 hashing,tdsk_prefix, revocation, expiration- API key integration tests (20 tests): auth, key lifecycle (create/list/revoke), cross-user isolation, MCP endpoint access
ClarificationDetectorTests.cs(22 tests + 6 false-positive regression): pattern detection, skip phrases, round counting, prompt building, strong/weak signal splitChatServiceClarificationTests.cs(7 tests): service-level clarification flow, round enforcement, skip behaviorConcurrencyRaceConditionStressTests.cs(13 tests): queue claim races, card conflicts, proposal approval races, rate limiting, multi-user stressEntityAdversarialInputTests.cs(77 FsCheck tests): Board, Card, Column, Label, AutomationProposal with adversarial strings, boundary lengths, GUID validationJsonSerializationRoundTripFuzzTests.cs(29 tests): serialize/deserialize identity, GUID format variations, DateTime boundaries, malformed JSONAdversarialInputApiTests.cs(80 tests): no 500s from adversarial input across all major endpoints, malformed JSON, wrong content types, concurrent adversarial
InboxView.spec.ts(+7 tests): primitive-driven loading/error/empty state assertions, skeleton detection, retry buttoninputSanitization.spec.ts(16 fast-check tests): card titles, search queries, board names, chat messages, URL encoding, JSON round-trip, Unicode edge casesstoreResilience.spec.ts(9 fast-check tests): random action sequences on board store, API error handling, adversarial content
- Backend:
FsCheckandFsCheck.Xunit(for property-based testing, extending existing pattern) - Frontend:
fast-check(dev dependency, for property-based testing)
- HIGH: CSV injection via embedded newlines in export (
#787), throughput double-counting in forecasting (#790), false-positive clarification heuristic (#791) - MEDIUM: Key-existence oracle + modulo bias in API key generation (
#792), capture DTO round-trip test (#789), history window denominator (#790), CancellationToken forwarding (#787) - Fixed test quality issues: misleading doc comments, weak assertions, non-thread-safe variables, redundant ARIA roles, missing screen reader announcements
Backend suite total after this wave: ~3,460+ passing. Frontend suite total: ~1,891 passing. Combined: ~5,370+.
This wave delivered the final 2 issues from the rigorous test expansion wave (#721):
#705— Concurrency and race condition stress tests (13 tests)#717— Property-based and adversarial input tests (211 tests)
All 25 of 25 issues in the test expansion wave are now delivered. Total new tests from the wave: ~1,350+.