test(e2e): add Convert and Stream view end-to-end tests by staging-devin-ai-integration[bot] · Pull Request #55 · streamer45/streamkit

staging-devin-ai-integration · 2026-02-13T17:26:58Z

Summary

Adds Playwright e2e tests covering the Convert view (oneshot pipeline) and Stream view (dynamic pipeline + MoQ connection). Also adds the infrastructure plumbing needed to support them.

New test files:

e2e/tests/convert.spec.ts — 3 tests: API-level mixing pipeline call (using browser-side fetch with early abort to avoid streaming timeout), UI file-upload conversion, UI existing-asset conversion. Uses the Audio Mixing (Upload + Music Track) sample pipeline with sample.ogg.
e2e/tests/stream.spec.ts — 2 tests: session lifecycle (create → verify badge → destroy), MoQ connection (create session → handle auto-connect → verify "Relay: connected" + "Watch: live" → disconnect → destroy). The MoQ test gracefully skips if the gateway is not configured or if WebTransport cannot be established (e.g. self-signed cert issues).
e2e/tests/test-helpers.ts — Shared utilities: createConsoleErrorCollector() for reusable console error tracking with configurable benign pattern filtering, verifyAudioPlayback() to assert the <audio> element loaded media, and installAudioContextTracker() / verifyAudioContextActive() to verify Hang is receiving/decoding/playing audio via the Web Audio API.

Infrastructure changes:

ui/src/views/ConvertView.tsx, ui/src/views/StreamView.tsx — add data-testid attributes needed by ensureLoggedIn().
e2e/playwright.config.ts — configure Chromium with --use-fake-device-for-media-stream, --use-file-for-fake-audio-capture=speech_10m.wav, and microphone permissions.
e2e/src/harness/run.ts — pass SK_SERVER__MOQ_GATEWAY_URL to the test server so the MoQ gateway is available.

Local test results: 9 passed, 3 skipped (MoQ connection skips because WebTransport with self-signed certs does not establish in headless Chromium). Consistent across multiple runs.

Updates since initial implementation

First revision:

API test refactored: Replaced Playwright apiContext.post() with browser-side fetch + AbortController. The mixing pipeline streams audio in real-time, causing the previous approach to time out (>90s even with generous timeouts). The new approach reads only the first response chunk and aborts, completing in ~2s.
Stream tests hardened: Added isBenignConsoleError() helper with a pattern list to filter expected MoQ/WebTransport/TLS errors that occur when the session auto-connect fires against a self-signed cert.
MoQ test handles auto-connect: Session creation triggers an automatic connection attempt. The test now waits for auto-connect to settle, retries with a manual "Connect & Stream" click if it failed, and only then skips if WebTransport still can't connect.
Template selectors fixed: Added { exact: true } to getByText() calls for template names to avoid strict-mode violations (the name also appeared in the YAML editor).
Removed unused request import from convert.spec.ts.

Second revision (PR review feedback):

Extracted shared console error helper: Created test-helpers.ts with createConsoleErrorCollector() to eliminate duplicated console listener setup across test files. Both convert and stream specs now import and use this shared helper.
Audio playback verification in convert tests: Added verifyAudioPlayback() helper that uses page.evaluate() to find the <audio> element, wait for metadata to load, attempt to play, and assert duration > 0. Both UI convert tests now verify the audio player actually loaded media after "Converted Audio" appears.
More robust MoQ error filtering: Replaced broad BENIGN_PATTERNS (which included generic ERR_QUIC_PROTOCOL_ERROR) with specific MOQ_BENIGN_PATTERNS that only match QUIC_TLS_CERTIFICATE_UNKNOWN, Timed out connecting to MoQ gateway, and Failed to construct 'WebTransport'. Genuine QUIC protocol errors unrelated to certificates will now correctly fail tests.
Hang audio verification in MoQ test: Added installAudioContextTracker() which monkey-patches AudioContext via page.addInitScript() to track all instances created by Hang. After "Watch: live" appears, the test waits 2s then calls verifyAudioContextActive() to assert at least one AudioContext is in 'running' state with currentTime > 0, confirming Hang is receiving, decoding, and playing audio.
Fixed sessionId dead code: The sessionId variable is now extracted from the page text (Session ID: ...) and assigned to the outer variable so the afterEach cleanup hook can actually delete sessions if a test fails mid-way.
Prettier formatting: Fixed formatting to match CI config (../ui/.prettierrc.json instead of local bunx prettier).

Third revision (teardown error handling + documentation):

Fixed WebTransportError: The session is closed during teardown: Added stop() method to ConsoleErrorCollector interface to detach the console listener. Both stream tests now call collector.stop() before disconnect/destroy operations to avoid capturing expected teardown noise (e.g. WebTransport session closure errors). Console error assertions are now performed before teardown begins.
Added documentation comments: Added JSDoc comments to all helper functions in test-helpers.ts explaining their purpose and usage patterns. Added inline comments to tricky logic in both spec files:
- convert.spec.ts: Documented the browser-side fetch with AbortController pattern and why it's needed to avoid streaming timeout
- stream.spec.ts: Documented auto-connect handling logic, sessionId extraction for cleanup, AudioContext verification pattern, and the collector.stop() pattern for avoiding false positives from teardown errors

Review & Testing Checklist for Human

MoQ connection test always skips in headless Chromium — WebTransport with self-signed certificates fails the TLS handshake even though fingerprints are served at /certificate.sha256. The test gracefully skips via test.skip(), but this means the full MoQ subscribe/broadcast path is never actually exercised in CI. Decide whether this is acceptable or if additional Chromium flags (e.g. --webtransport-developer-mode) or environment setup is needed.
speech_10m.wav does not exist on disk — only speech_10m.opus and speech_10m.wav.license are present in samples/audio/system/. The --use-file-for-fake-audio-capture flag points to a missing file, causing Chromium to fall back to its built-in sine-wave generator. Decide whether to generate/commit the WAV file, convert from opus, or adjust the path.
Console error collection stops before teardown — The collector.stop() pattern prevents capturing expected teardown errors (like WebTransportError: The session is closed), but this means any real errors during disconnect/destroy would also be silently ignored. Verify this tradeoff is acceptable.
Template name selectors use { exact: true } — the tests select templates by exact text match ('Audio Mixing (Upload + Music Track)', 'MoQ Peer Transcoder (Gateway)'). If these names change in the sample pipelines, the tests will break. Consider whether this is acceptable or if a more stable selector (e.g. by pipeline ID) should be used.
AudioContext monkey-patching is fragile — installAudioContextTracker() replaces window.AudioContext with a subclass to track instances. This works for standard new AudioContext() calls but could break if the app uses non-standard instantiation patterns or performs instanceof checks in unexpected ways. Verify this approach is robust enough for the codebase.

Test Plan

Run cd e2e && bun run test to verify all tests pass locally (9 passed, 3 skipped expected)
Check CI logs to confirm the same pass/skip pattern (note: E2E check may fail due to GitHub Actions infrastructure issues with Microsoft package repos, but all required checks should pass)
Manually test the Convert view with the Audio Mixing template to verify the UI flow matches the test expectations
Manually test the Stream view session lifecycle to verify the UI flow matches the test expectations
If possible, test MoQ connection in a non-headless browser or with additional Chromium flags to verify the full subscribe/broadcast path

Notes

Link to Devin run: https://staging.itsdev.in/sessions/8c345a597da34a5c9953d7c8fc2ecb11
Requested by: @streamer45

Add Playwright e2e tests covering: - Convert view with Audio Mixing pipeline (API + UI tests) - Stream view with MoQ session lifecycle and connection tests Infrastructure changes: - Add data-testid to ConvertView and StreamView for test selectors - Configure fake audio device and microphone permissions in Playwright - Pass MoQ gateway URL to e2e server harness Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

staging-devin-ai-integration · 2026-02-13T17:27:02Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

- API test: use browser-side fetch with early abort to avoid timeout from streaming response body (mixing pipeline streams in real-time) - Session lifecycle test: filter QUIC/TLS/cert errors from auto-connect - MoQ connection test: handle auto-connect behavior, gracefully skip when WebTransport connection cannot be established - Remove unused 'request' import from convert.spec.ts Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

…lation When MoQ is connected, two Disconnect buttons are rendered (one in SessionPanel, one in Connection & Controls). Use .first() to resolve the strict mode ambiguity. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

staging-devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Debug

Playground

staging-devin-ai-integration · 2026-02-13T18:48:00Z

e2e/tests/stream.spec.ts

+
+test.describe('Stream View - Dynamic Pipeline', () => {
+  const consoleErrors: string[] = [];
+  let sessionId: string | null = null;


🟡 sessionId is never assigned, so the afterEach cleanup hook is dead code

The sessionId variable is declared at line 26 and initialized to null, but it is never assigned a value anywhere in either test body. Both tests create sessions through the UI (clicking "Create Session"), but neither extracts and stores the resulting session ID into sessionId.

Impact: orphaned sessions on test failure

The afterEach hook at e2e/tests/stream.spec.ts:149-162 is designed as a safety net to clean up sessions via the API if a test fails mid-way (after creating a session but before the in-test destroy logic completes):

test.afterEach(async ({ baseURL }) => { if (sessionId) { // always null → cleanup never runs // ... await apiContext.delete(`/api/v1/sessions/${sessionId}`); // ... } });

Since sessionId is always null, the if (sessionId) check is always false, making the entire cleanup block dead code. If either test fails after creating a session but before destroying it (e.g., a timeout on the "Session Active" badge), the server-side session will leak and persist across subsequent tests, potentially causing cascading test failures due to stale state.

Prompt for agents

In e2e/tests/stream.spec.ts, the `sessionId` variable declared on line 26 is never assigned a value, making the afterEach cleanup hook (lines 149-163) ineffective. Both tests (starting at lines 43 and 76) create sessions via UI interaction but never capture the session ID. To fix this, after session creation succeeds in each test (e.g., after the 'Session Active' badge becomes visible around lines 58 and 99), extract the session ID from the page (for example, by reading the text content of the element matching `/Session ID:/`) and assign it to the outer `sessionId` variable. This ensures that if a test fails after session creation but before in-test cleanup, the afterEach hook will properly destroy the orphaned session via the API.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

streamer45

Not bad! Left some comments for some potential improvements.

e2e/tests/convert.spec.ts

e2e/tests/stream.spec.ts

…on, robust MoQ error filtering - Extract console error collection into shared test-helpers.ts with createConsoleErrorCollector() and configurable benign pattern filtering - Replace broad BENIGN_PATTERNS with specific MOQ_BENIGN_PATTERNS (QUIC_TLS_CERTIFICATE_UNKNOWN instead of ERR_QUIC_PROTOCOL_ERROR) - Add verifyAudioPlayback() helper to check audio element has loaded and started playback in convert tests - Add installAudioContextTracker() / verifyAudioContextActive() to verify Hang is receiving/decoding/playing audio via Web Audio API - Fix sessionId extraction from UI so afterEach cleanup actually works - Clear sessionId after successful session destruction Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

WebTransport teardown emits benign 'The session is closed' errors during disconnect. Assert and stop the collector before the disconnect/destroy phase so shutdown noise does not cause false failures. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

The previous RAF-based dimension batching collected all dimension changes and flushed them in a single commit. Profiling showed this created a 190ms jank (commit #55) — worse than the original 8 × ~20ms spread across multiple frames. Replace with a simpler approach: wrap dimension changes in React.startTransition so React schedules them at lower priority. This avoids concentrating all dimension work into one frame while still keeping interactive changes (select, drag, remove) immediate. Also keeps the merged startTransition for setNodes+setEdges in applyPatch (from the previous commit) which avoids double renders from the subscription path. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

…nitor view re-renders (#143) * perf: batch WebSocket state updates and session prefetch to reduce monitor view re-renders - Buffer nodestatechanged/nodestatsupdated events in Maps, flush via queueMicrotask for a single Zustand set() per session per microtask - Add batchUpdateNodeStates, batchUpdateNodeStats, batchSetPipelines methods to sessionStore for atomic bulk mutations - Switch useSessionsPrefetch from N individual setPipeline() calls to one batchSetPipelines() call - Wrap MonitorView with React.Profiler (dev-only) for perf measurement - Add Layer 2 e2e perf test for monitor session load render budget Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf: switch batch flush from queueMicrotask to requestAnimationFrame queueMicrotask drains after each macrotask, so it cannot coalesce separate WebSocket onmessage callbacks. requestAnimationFrame defers the flush until the next paint, batching all WS events that arrive within a single animation frame (~16ms at 60fps) into ONE Zustand set() call via the new batchUpdateSessionData() method. Also: - Add batchUpdateSessionData to sessionStore for combined state+stats flush in a single set() call - Clear pending batch buffers on WebSocket close() to prevent stale RAF callbacks after teardown - Assert MonitorView profiler data exists in e2e perf test Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: address review feedback for RAF batching - Fix stale comment: 'microtask-level' → 'frame-level' to match RAF impl - Clear pendingNodeStates/pendingNodeStats on session destroy - Update websocket tests to manually flush RAF batch before assertions Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf: decouple nodeStates from MonitorViewContent render cycle Replace reactive useSession(nodeStates) subscription with a direct Zustand store subscription that patches ReactFlow nodes and edges from the callback. This completely bypasses React's render cycle for the ~3600-line MonitorViewContent component during high-frequency node-state transitions. Before: each node state change (e.g. Initializing → Running) caused a full MonitorViewContent re-render (~25-40ms), multiplied by ~10 nodes during session load. After: the store subscription fires an O(1) reference check per store change and only patches the affected ReactFlow nodes/edges via startTransition, with zero MonitorViewContent re-renders. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf: throttle nodeStates patches to coalesce burst transitions During session load, ~8 nodes transition state on separate animation frames, each triggering a ~20ms MonitorViewContent re-render via setNodes/setEdges. Add a leading-edge + trailing-edge throttle (100ms window) so that the first change applies immediately and subsequent rapid changes are coalesced into 2-3 patches instead of 8. Expected reduction: ~160ms of MonitorViewContent re-renders collapsed to ~40-60ms during session load, while steady-state changes still apply within one throttle window. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(ui): batch ReactFlow dimension changes and merge transitions ReactFlow fires individual onNodesChange callbacks with type='dimensions' for each newly-mounted node as it measures them. Each dimension change triggers a setNodes update, causing a full MonitorViewContent re-render (~20ms each). During session load with ~8 nodes, this creates ~8 consecutive re-renders totaling ~160ms of wasted render time. Two optimizations: 1. Intercept onNodesChange and collect all dimension-type changes, then flush them in a single RAF callback wrapped in startTransition. This collapses ~8 separate renders into 1. 2. Merge the two separate startTransition blocks in applyPatch (one for setNodes, one for setEdges) into a single startTransition so React batches both state updates into one render pass instead of two. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(ui): use startTransition for dimension changes instead of RAF batch The previous RAF-based dimension batching collected all dimension changes and flushed them in a single commit. Profiling showed this created a 190ms jank (commit #55) — worse than the original 8 × ~20ms spread across multiple frames. Replace with a simpler approach: wrap dimension changes in React.startTransition so React schedules them at lower priority. This avoids concentrating all dimension work into one frame while still keeping interactive changes (select, drag, remove) immediate. Also keeps the merged startTransition for setNodes+setEdges in applyPatch (from the previous commit) which avoids double renders from the subscription path. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): store RAF ID and cancel explicitly on close() Address review feedback: store the requestAnimationFrame ID so it can be cancelled via cancelAnimationFrame in close(), rather than relying on cleared maps to make the stale callback a no-op. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): remove MonitorView Profiler to fix compositor-perf cascade The permanent <React.Profiler id='MonitorView'> wrapper caused a cascade regression in compositor-perf.spec.ts. Since MonitorView wraps the entire ReactFlow tree (including CompositorNode), every slider-drag commit fired the MonitorView onRender callback, making it appear as a cascade when it was just the outer profiler boundary counting all child commits. Remove the permanent Profiler from MonitorView. Update the monitor-session-load-perf test to assert on CompositorNode profiler data (which has its own Profiler) instead of MonitorView. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): reset topoEffectRanRef on session switch After switching sessions, topoEffectRanRef could still be true from the previous session, allowing the store subscription to apply patches using the old pipeline data before the topology effect runs for the new session. Reset it to false when the subscription re-creates. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): revert topoEffectRanRef reset that blocked nodeState patches The previous commit reset topoEffectRanRef.current = false in the subscription effect setup. Since React runs effects in declaration order, the topology effect sets it to true first, then the subscription effect immediately resets it to false — permanently blocking all subsequent nodeState patches (the subscription callback checks topoEffectRanRef.current before applying patches). The stale-data case is already handled by the isInitialMountRef guard and the pipelineRef.current null check inside applyPatch, so the reset is unnecessary. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration bot assigned streamer45 Feb 13, 2026

staging-devin-ai-integration bot requested a review from streamer45 February 13, 2026 17:27

streamkit-devin added 2 commits February 13, 2026 18:21

staging-devin-ai-integration bot commented Feb 13, 2026

View reviewed changes

streamer45 reviewed Feb 14, 2026

View reviewed changes

e2e/tests/convert.spec.ts Show resolved Hide resolved

e2e/tests/convert.spec.ts Outdated Show resolved Hide resolved

e2e/tests/stream.spec.ts Outdated Show resolved Hide resolved

e2e/tests/stream.spec.ts Show resolved Hide resolved

streamkit-devin added 4 commits February 14, 2026 08:21

style(e2e): fix prettier formatting to match CI config

ae6a061

Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

docs(e2e): add documentation comments to helpers and tricky logic

0c71497

Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <devin@streamkit.dev>

streamer45 merged commit 4bd0d2f into main Feb 14, 2026
14 checks passed

streamer45 deleted the devin/1771003395-e2e-convert-stream-tests branch February 14, 2026 10:03

staging-devin-ai-integration bot mentioned this pull request Mar 14, 2026

perf: batch WebSocket state updates and session prefetch to reduce monitor view re-renders #143

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): add Convert and Stream view end-to-end tests#55

test(e2e): add Convert and Stream view end-to-end tests#55
streamer45 merged 7 commits intomainfrom
devin/1771003395-e2e-convert-stream-tests

staging-devin-ai-integration bot commented Feb 13, 2026 •

edited

Loading

Uh oh!

staging-devin-ai-integration bot commented Feb 13, 2026

Uh oh!

staging-devin-ai-integration bot left a comment

Uh oh!

staging-devin-ai-integration bot Feb 13, 2026

Uh oh!

streamer45 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

staging-devin-ai-integration bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Updates since initial implementation

Review & Testing Checklist for Human

Test Plan

Notes

Uh oh!

staging-devin-ai-integration bot commented Feb 13, 2026

🤖 Devin AI Engineer

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

streamer45 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

staging-devin-ai-integration bot commented Feb 13, 2026 •

edited

Loading