Skip to content

test(e2e): add Convert and Stream view end-to-end tests#55

Merged
streamer45 merged 7 commits intomainfrom
devin/1771003395-e2e-convert-stream-tests
Feb 14, 2026
Merged

test(e2e): add Convert and Stream view end-to-end tests#55
streamer45 merged 7 commits intomainfrom
devin/1771003395-e2e-convert-stream-tests

Conversation

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor

@staging-devin-ai-integration staging-devin-ai-integration bot commented Feb 13, 2026

Summary

Adds Playwright e2e tests covering the Convert view (oneshot pipeline) and Stream view (dynamic pipeline + MoQ connection). Also adds the infrastructure plumbing needed to support them.

New test files:

  • e2e/tests/convert.spec.ts — 3 tests: API-level mixing pipeline call (using browser-side fetch with early abort to avoid streaming timeout), UI file-upload conversion, UI existing-asset conversion. Uses the Audio Mixing (Upload + Music Track) sample pipeline with sample.ogg.
  • e2e/tests/stream.spec.ts — 2 tests: session lifecycle (create → verify badge → destroy), MoQ connection (create session → handle auto-connect → verify "Relay: connected" + "Watch: live" → disconnect → destroy). The MoQ test gracefully skips if the gateway is not configured or if WebTransport cannot be established (e.g. self-signed cert issues).
  • e2e/tests/test-helpers.ts — Shared utilities: createConsoleErrorCollector() for reusable console error tracking with configurable benign pattern filtering, verifyAudioPlayback() to assert the <audio> element loaded media, and installAudioContextTracker() / verifyAudioContextActive() to verify Hang is receiving/decoding/playing audio via the Web Audio API.

Infrastructure changes:

  • ui/src/views/ConvertView.tsx, ui/src/views/StreamView.tsx — add data-testid attributes needed by ensureLoggedIn().
  • e2e/playwright.config.ts — configure Chromium with --use-fake-device-for-media-stream, --use-file-for-fake-audio-capture=speech_10m.wav, and microphone permissions.
  • e2e/src/harness/run.ts — pass SK_SERVER__MOQ_GATEWAY_URL to the test server so the MoQ gateway is available.

Local test results: 9 passed, 3 skipped (MoQ connection skips because WebTransport with self-signed certs does not establish in headless Chromium). Consistent across multiple runs.

Updates since initial implementation

First revision:

  • API test refactored: Replaced Playwright apiContext.post() with browser-side fetch + AbortController. The mixing pipeline streams audio in real-time, causing the previous approach to time out (>90s even with generous timeouts). The new approach reads only the first response chunk and aborts, completing in ~2s.
  • Stream tests hardened: Added isBenignConsoleError() helper with a pattern list to filter expected MoQ/WebTransport/TLS errors that occur when the session auto-connect fires against a self-signed cert.
  • MoQ test handles auto-connect: Session creation triggers an automatic connection attempt. The test now waits for auto-connect to settle, retries with a manual "Connect & Stream" click if it failed, and only then skips if WebTransport still can't connect.
  • Template selectors fixed: Added { exact: true } to getByText() calls for template names to avoid strict-mode violations (the name also appeared in the YAML editor).
  • Removed unused request import from convert.spec.ts.

Second revision (PR review feedback):

  • Extracted shared console error helper: Created test-helpers.ts with createConsoleErrorCollector() to eliminate duplicated console listener setup across test files. Both convert and stream specs now import and use this shared helper.
  • Audio playback verification in convert tests: Added verifyAudioPlayback() helper that uses page.evaluate() to find the <audio> element, wait for metadata to load, attempt to play, and assert duration > 0. Both UI convert tests now verify the audio player actually loaded media after "Converted Audio" appears.
  • More robust MoQ error filtering: Replaced broad BENIGN_PATTERNS (which included generic ERR_QUIC_PROTOCOL_ERROR) with specific MOQ_BENIGN_PATTERNS that only match QUIC_TLS_CERTIFICATE_UNKNOWN, Timed out connecting to MoQ gateway, and Failed to construct 'WebTransport'. Genuine QUIC protocol errors unrelated to certificates will now correctly fail tests.
  • Hang audio verification in MoQ test: Added installAudioContextTracker() which monkey-patches AudioContext via page.addInitScript() to track all instances created by Hang. After "Watch: live" appears, the test waits 2s then calls verifyAudioContextActive() to assert at least one AudioContext is in 'running' state with currentTime > 0, confirming Hang is receiving, decoding, and playing audio.
  • Fixed sessionId dead code: The sessionId variable is now extracted from the page text (Session ID: ...) and assigned to the outer variable so the afterEach cleanup hook can actually delete sessions if a test fails mid-way.
  • Prettier formatting: Fixed formatting to match CI config (../ui/.prettierrc.json instead of local bunx prettier).

Third revision (teardown error handling + documentation):

  • Fixed WebTransportError: The session is closed during teardown: Added stop() method to ConsoleErrorCollector interface to detach the console listener. Both stream tests now call collector.stop() before disconnect/destroy operations to avoid capturing expected teardown noise (e.g. WebTransport session closure errors). Console error assertions are now performed before teardown begins.
  • Added documentation comments: Added JSDoc comments to all helper functions in test-helpers.ts explaining their purpose and usage patterns. Added inline comments to tricky logic in both spec files:
    • convert.spec.ts: Documented the browser-side fetch with AbortController pattern and why it's needed to avoid streaming timeout
    • stream.spec.ts: Documented auto-connect handling logic, sessionId extraction for cleanup, AudioContext verification pattern, and the collector.stop() pattern for avoiding false positives from teardown errors

Review & Testing Checklist for Human

  • MoQ connection test always skips in headless Chromium — WebTransport with self-signed certificates fails the TLS handshake even though fingerprints are served at /certificate.sha256. The test gracefully skips via test.skip(), but this means the full MoQ subscribe/broadcast path is never actually exercised in CI. Decide whether this is acceptable or if additional Chromium flags (e.g. --webtransport-developer-mode) or environment setup is needed.
  • speech_10m.wav does not exist on disk — only speech_10m.opus and speech_10m.wav.license are present in samples/audio/system/. The --use-file-for-fake-audio-capture flag points to a missing file, causing Chromium to fall back to its built-in sine-wave generator. Decide whether to generate/commit the WAV file, convert from opus, or adjust the path.
  • Console error collection stops before teardown — The collector.stop() pattern prevents capturing expected teardown errors (like WebTransportError: The session is closed), but this means any real errors during disconnect/destroy would also be silently ignored. Verify this tradeoff is acceptable.
  • Template name selectors use { exact: true } — the tests select templates by exact text match ('Audio Mixing (Upload + Music Track)', 'MoQ Peer Transcoder (Gateway)'). If these names change in the sample pipelines, the tests will break. Consider whether this is acceptable or if a more stable selector (e.g. by pipeline ID) should be used.
  • AudioContext monkey-patching is fragileinstallAudioContextTracker() replaces window.AudioContext with a subclass to track instances. This works for standard new AudioContext() calls but could break if the app uses non-standard instantiation patterns or performs instanceof checks in unexpected ways. Verify this approach is robust enough for the codebase.

Test Plan

  1. Run cd e2e && bun run test to verify all tests pass locally (9 passed, 3 skipped expected)
  2. Check CI logs to confirm the same pass/skip pattern (note: E2E check may fail due to GitHub Actions infrastructure issues with Microsoft package repos, but all required checks should pass)
  3. Manually test the Convert view with the Audio Mixing template to verify the UI flow matches the test expectations
  4. Manually test the Stream view session lifecycle to verify the UI flow matches the test expectations
  5. If possible, test MoQ connection in a non-headless browser or with additional Chromium flags to verify the full subscribe/broadcast path

Notes


Staging: Open in Devin

Add Playwright e2e tests covering:
- Convert view with Audio Mixing pipeline (API + UI tests)
- Stream view with MoQ session lifecycle and connection tests

Infrastructure changes:
- Add data-testid to ConvertView and StreamView for test selectors
- Configure fake audio device and microphone permissions in Playwright
- Pass MoQ gateway URL to e2e server harness

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <devin@streamkit.dev>
@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

- API test: use browser-side fetch with early abort to avoid timeout
  from streaming response body (mixing pipeline streams in real-time)
- Session lifecycle test: filter QUIC/TLS/cert errors from auto-connect
- MoQ connection test: handle auto-connect behavior, gracefully skip
  when WebTransport connection cannot be established
- Remove unused 'request' import from convert.spec.ts

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <devin@streamkit.dev>
…lation

When MoQ is connected, two Disconnect buttons are rendered (one in
SessionPanel, one in Connection & Controls). Use .first() to resolve
the strict mode ambiguity.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <devin@streamkit.dev>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground


test.describe('Stream View - Dynamic Pipeline', () => {
const consoleErrors: string[] = [];
let sessionId: string | null = null;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 sessionId is never assigned, so the afterEach cleanup hook is dead code

The sessionId variable is declared at line 26 and initialized to null, but it is never assigned a value anywhere in either test body. Both tests create sessions through the UI (clicking "Create Session"), but neither extracts and stores the resulting session ID into sessionId.

Impact: orphaned sessions on test failure

The afterEach hook at e2e/tests/stream.spec.ts:149-162 is designed as a safety net to clean up sessions via the API if a test fails mid-way (after creating a session but before the in-test destroy logic completes):

test.afterEach(async ({ baseURL }) => {
    if (sessionId) {  // always null → cleanup never runs
      // ...
      await apiContext.delete(`/api/v1/sessions/${sessionId}`);
      // ...
    }
});

Since sessionId is always null, the if (sessionId) check is always false, making the entire cleanup block dead code. If either test fails after creating a session but before destroying it (e.g., a timeout on the "Session Active" badge), the server-side session will leak and persist across subsequent tests, potentially causing cascading test failures due to stale state.

Prompt for agents
In e2e/tests/stream.spec.ts, the `sessionId` variable declared on line 26 is never assigned a value, making the afterEach cleanup hook (lines 149-163) ineffective. Both tests (starting at lines 43 and 76) create sessions via UI interaction but never capture the session ID. To fix this, after session creation succeeds in each test (e.g., after the 'Session Active' badge becomes visible around lines 58 and 99), extract the session ID from the page (for example, by reading the text content of the element matching `/Session ID:/`) and assign it to the outer `sessionId` variable. This ensures that if a test fails after session creation but before in-test cleanup, the afterEach hook will properly destroy the orphaned session via the API.
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Owner

@streamer45 streamer45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not bad! Left some comments for some potential improvements.

…on, robust MoQ error filtering

- Extract console error collection into shared test-helpers.ts with
  createConsoleErrorCollector() and configurable benign pattern filtering
- Replace broad BENIGN_PATTERNS with specific MOQ_BENIGN_PATTERNS
  (QUIC_TLS_CERTIFICATE_UNKNOWN instead of ERR_QUIC_PROTOCOL_ERROR)
- Add verifyAudioPlayback() helper to check audio element has loaded
  and started playback in convert tests
- Add installAudioContextTracker() / verifyAudioContextActive() to
  verify Hang is receiving/decoding/playing audio via Web Audio API
- Fix sessionId extraction from UI so afterEach cleanup actually works
- Clear sessionId after successful session destruction

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <devin@streamkit.dev>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <devin@streamkit.dev>
WebTransport teardown emits benign 'The session is closed' errors
during disconnect. Assert and stop the collector before the
disconnect/destroy phase so shutdown noise does not cause false
failures.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <devin@streamkit.dev>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <devin@streamkit.dev>
@streamer45 streamer45 merged commit 4bd0d2f into main Feb 14, 2026
14 checks passed
@streamer45 streamer45 deleted the devin/1771003395-e2e-convert-stream-tests branch February 14, 2026 10:03
staging-devin-ai-integration bot pushed a commit that referenced this pull request Mar 14, 2026
The previous RAF-based dimension batching collected all dimension changes
and flushed them in a single commit.  Profiling showed this created a
190ms jank (commit #55) — worse than the original 8 × ~20ms spread across
multiple frames.

Replace with a simpler approach: wrap dimension changes in
React.startTransition so React schedules them at lower priority.  This
avoids concentrating all dimension work into one frame while still keeping
interactive changes (select, drag, remove) immediate.

Also keeps the merged startTransition for setNodes+setEdges in
applyPatch (from the previous commit) which avoids double renders
from the subscription path.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
streamer45 added a commit that referenced this pull request Mar 14, 2026
…nitor view re-renders (#143)

* perf: batch WebSocket state updates and session prefetch to reduce monitor view re-renders

- Buffer nodestatechanged/nodestatsupdated events in Maps, flush via
  queueMicrotask for a single Zustand set() per session per microtask
- Add batchUpdateNodeStates, batchUpdateNodeStats, batchSetPipelines
  methods to sessionStore for atomic bulk mutations
- Switch useSessionsPrefetch from N individual setPipeline() calls to
  one batchSetPipelines() call
- Wrap MonitorView with React.Profiler (dev-only) for perf measurement
- Add Layer 2 e2e perf test for monitor session load render budget

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* perf: switch batch flush from queueMicrotask to requestAnimationFrame

queueMicrotask drains after each macrotask, so it cannot coalesce
separate WebSocket onmessage callbacks.  requestAnimationFrame defers
the flush until the next paint, batching all WS events that arrive
within a single animation frame (~16ms at 60fps) into ONE Zustand
set() call via the new batchUpdateSessionData() method.

Also:
- Add batchUpdateSessionData to sessionStore for combined state+stats
  flush in a single set() call
- Clear pending batch buffers on WebSocket close() to prevent stale
  RAF callbacks after teardown
- Assert MonitorView profiler data exists in e2e perf test

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix: address review feedback for RAF batching

- Fix stale comment: 'microtask-level' → 'frame-level' to match RAF impl
- Clear pendingNodeStates/pendingNodeStats on session destroy
- Update websocket tests to manually flush RAF batch before assertions

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* perf: decouple nodeStates from MonitorViewContent render cycle

Replace reactive useSession(nodeStates) subscription with a direct
Zustand store subscription that patches ReactFlow nodes and edges
from the callback.  This completely bypasses React's render cycle
for the ~3600-line MonitorViewContent component during high-frequency
node-state transitions.

Before: each node state change (e.g. Initializing → Running) caused
a full MonitorViewContent re-render (~25-40ms), multiplied by ~10
nodes during session load.

After: the store subscription fires an O(1) reference check per store
change and only patches the affected ReactFlow nodes/edges via
startTransition, with zero MonitorViewContent re-renders.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* perf: throttle nodeStates patches to coalesce burst transitions

During session load, ~8 nodes transition state on separate animation
frames, each triggering a ~20ms MonitorViewContent re-render via
setNodes/setEdges.  Add a leading-edge + trailing-edge throttle
(100ms window) so that the first change applies immediately and
subsequent rapid changes are coalesced into 2-3 patches instead
of 8.

Expected reduction: ~160ms of MonitorViewContent re-renders collapsed
to ~40-60ms during session load, while steady-state changes still
apply within one throttle window.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* perf(ui): batch ReactFlow dimension changes and merge transitions

ReactFlow fires individual onNodesChange callbacks with type='dimensions'
for each newly-mounted node as it measures them. Each dimension change
triggers a setNodes update, causing a full MonitorViewContent re-render
(~20ms each). During session load with ~8 nodes, this creates ~8
consecutive re-renders totaling ~160ms of wasted render time.

Two optimizations:
1. Intercept onNodesChange and collect all dimension-type changes, then
   flush them in a single RAF callback wrapped in startTransition. This
   collapses ~8 separate renders into 1.
2. Merge the two separate startTransition blocks in applyPatch (one for
   setNodes, one for setEdges) into a single startTransition so React
   batches both state updates into one render pass instead of two.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* perf(ui): use startTransition for dimension changes instead of RAF batch

The previous RAF-based dimension batching collected all dimension changes
and flushed them in a single commit.  Profiling showed this created a
190ms jank (commit #55) — worse than the original 8 × ~20ms spread across
multiple frames.

Replace with a simpler approach: wrap dimension changes in
React.startTransition so React schedules them at lower priority.  This
avoids concentrating all dimension work into one frame while still keeping
interactive changes (select, drag, remove) immediate.

Also keeps the merged startTransition for setNodes+setEdges in
applyPatch (from the previous commit) which avoids double renders
from the subscription path.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(ui): store RAF ID and cancel explicitly on close()

Address review feedback: store the requestAnimationFrame ID so it can
be cancelled via cancelAnimationFrame in close(), rather than relying
on cleared maps to make the stale callback a no-op.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(ui): remove MonitorView Profiler to fix compositor-perf cascade

The permanent <React.Profiler id='MonitorView'> wrapper caused a
cascade regression in compositor-perf.spec.ts.  Since MonitorView wraps
the entire ReactFlow tree (including CompositorNode), every slider-drag
commit fired the MonitorView onRender callback, making it appear as a
cascade when it was just the outer profiler boundary counting all
child commits.

Remove the permanent Profiler from MonitorView.  Update the
monitor-session-load-perf test to assert on CompositorNode profiler
data (which has its own Profiler) instead of MonitorView.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(ui): reset topoEffectRanRef on session switch

After switching sessions, topoEffectRanRef could still be true from the
previous session, allowing the store subscription to apply patches using
the old pipeline data before the topology effect runs for the new
session.  Reset it to false when the subscription re-creates.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(ui): revert topoEffectRanRef reset that blocked nodeState patches

The previous commit reset topoEffectRanRef.current = false in the
subscription effect setup.  Since React runs effects in declaration
order, the topology effect sets it to true first, then the subscription
effect immediately resets it to false — permanently blocking all
subsequent nodeState patches (the subscription callback checks
topoEffectRanRef.current before applying patches).

The stale-data case is already handled by the isInitialMountRef guard
and the pipelineRef.current null check inside applyPatch, so the
reset is unnecessary.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

---------

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-authored-by: StreamKit Devin <devin@streamkit.dev>
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants