feat(compositor): factor output_format into GPU heuristic by staging-devin-ai-integration[bot] · Pull Request #218 · streamer45/streamkit

staging-devin-ai-integration · 2026-03-29T09:47:24Z

Summary

Commit 1 — feat(compositor): factor output_format into GPU heuristic

When the compositor's output_format is NV12 or I420, the GPU path eliminates the expensive CPU RGBA→YUV conversion entirely via the rgba_to_yuv.wgsl compute shader. Previously, should_use_gpu() didn't consider this — a single 720p layer with NV12 output would stay on CPU (1 item, < 1080p pixels, no effects), paying the full ~14.4% CPU cost that the GPU path handles for free.

This adds output_format: Option<PixelFormat> to should_use_gpu() and should_use_gpu_with_state(). When the output needs YUV conversion, the heuristic now prefers GPU compositing even for simple scenes.

Profile context (30s CPU profile, CPU path, 1080p VP9 encoding pipeline):

rgba8_to_nv12_buf = 9.12% flat — fix(tools): install script #1 CPU consumer
parallel_rows (serving NV12 path) = 5.28% flat
Combined = 14.4% of CPU time eliminated when GPU path is active

Commit 2 — ci: cancel superseded workflow runs on same PR

Adds a concurrency group to ci.yml keyed on PR number / branch ref with cancel-in-progress: true. This prevents the single self-hosted GPU runner from being blocked by stale jobs when new commits are pushed.

Review & Testing Checklist for Human

Verify the heuristic change makes sense for your deployment: with this change, any pipeline with NV12/I420 output will always prefer GPU when available (even single-layer, small canvas). Confirm this is desired behavior for your typical pipeline configurations.
Run a profiled pipeline (e.g. just skit-profiling serve with a VP9 encoding session) with --features gpu on a GPU-equipped machine and compare CPU profile before/after — the rgba8_to_nv12_buf and parallel_rows entries should disappear from the hot path.
Verify the concurrency setting correctly cancels stale CI runs — push two quick commits to a PR and confirm the first run is cancelled.

Notes

The existing should_use_gpu tests are updated to pass None (preserving prior behavior for scenes without explicit output format).
Three new test cases cover: NV12 output → GPU, I420 output → GPU, RGBA8 output → CPU (for single small layer baseline).
No changes to the GPU compute shader or readback pipeline — this only affects when the GPU path is chosen, not how it operates.
Two pre-existing flaky tests on main (test_oneshot_processes_faster_than_realtime and test_compositor_output_format_runtime_change) fail on the GPU runner due to timing sensitivity under load — not caused by this PR (both use 4×4 canvases with output_format: None, so the new heuristic path is never activated).

Link to Devin session: https://staging.itsdev.in/sessions/4eafeb9c24d342bba7d1b41238fcb3e4
Requested by: @streamer45

When the compositor's output_format is NV12 or I420, the GPU path eliminates the expensive CPU RGBA→YUV conversion entirely (~14% of CPU time in profiled pipelines). The should_use_gpu() heuristic now considers this, preferring GPU compositing whenever YUV output is requested — even for simple scenes that would otherwise stay on CPU. This addresses the #1 CPU hotspot identified in production profiling: rgba8_to_nv12_buf at 9.12% + parallel_rows at 5.28% = 14.4% combined. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration · 2026-03-29T09:47:27Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

staging-devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Debug

Playground

Adds a concurrency group keyed on PR number / branch ref with cancel-in-progress: true. This prevents the single self-hosted GPU runner from being blocked by stale jobs when new commits are pushed. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

Two tests flaked on the self-hosted GPU runner where many tests run concurrently and compete for CPU: 1. test_oneshot_processes_faster_than_realtime: reduced from 30@30fps (budget 500ms vs 1000ms real-time = 10% margin) to 10@5fps (budget 1500ms vs 2000ms real-time = 25% margin). The previous budget was nearly indistinguishable from per-frame scheduling overhead (~30ms) under CI load. 2. test_compositor_output_format_runtime_change: increased inter-step sleeps from 100/50/100ms to 300/200/300ms. The compositor thread can be starved for CPU when GPU tests run in parallel, so the original windows were not enough for even one tick to fire. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

The test consistently got only 1 output frame instead of ≥ 2 on the self-hosted GPU runner. Root cause: when compiled with --features gpu and gpu_mode Auto (the default), the compositor OS thread blocks on GpuContext::try_init() before processing any compositing work. On the GPU runner with many tests competing for the device, init can exceed the total sleep budget (800ms). By the time it finishes, both input frames have been drained to just the latest (Dynamic mode behaviour), producing a single output. Fix: set gpu_mode: "cpu" explicitly. This test validates runtime output_format switching via UpdateParams, not GPU compositing — GPU init is unnecessary overhead that creates the race. Also reduces sleep durations to 200/100/200ms (from 300/200/300ms) since without GPU init the compositor thread starts processing immediately. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration bot assigned streamer45 Mar 29, 2026

staging-devin-ai-integration bot requested a review from streamer45 March 29, 2026 09:47

staging-devin-ai-integration bot commented Mar 29, 2026

View reviewed changes

streamkit-devin and others added 2 commits March 29, 2026 10:22

streamer45 enabled auto-merge (squash) March 29, 2026 10:32

streamer45 merged commit 42dc9fc into main Mar 29, 2026
16 checks passed

streamer45 deleted the devin/1774777096-gpu-heuristic-output-format branch March 29, 2026 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compositor): factor output_format into GPU heuristic#218

feat(compositor): factor output_format into GPU heuristic#218
streamer45 merged 4 commits intomainfrom
devin/1774777096-gpu-heuristic-output-format

staging-devin-ai-integration bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

staging-devin-ai-integration bot commented Mar 29, 2026

Uh oh!

staging-devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

staging-devin-ai-integration bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

staging-devin-ai-integration bot commented Mar 29, 2026

🤖 Devin AI Engineer

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

staging-devin-ai-integration bot commented Mar 29, 2026 •

edited

Loading