fix(tools): install script by streamer45 · Pull Request #1 · streamer45/streamkit

streamer45 · 2025-12-27T15:31:32Z

Summary

Fixing the install script link + adding uninstall subcommand.

chunks_exact(4).enumerate() added MORE overhead than Range::next: - ChunksExact::next -> split_at_checked -> split_at_unchecked -> from_raw_parts chain consumed ~33% CPU vs original ~14% from Range::next. - Enumerate::next alone was 15.33% of total CPU. Revert to simple 'for col in 0..w' with pre-computed row bases. The buffer pooling (optimization #1) is confirmed working well via DHAT: ~1GB alloc churn eliminated. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

… recv_from_any_slot Introduces SlotRecvResult enum with Frame/ChannelClosed/NonVideo/Empty variants. The main loop now removes closed slots and skips non-video packets instead of treating any single channel close as all-inputs-closed. Also adds a comment about dropped in-flight results on shutdown (Fix #6). Optimizes overlay cloning by using Arc<[Arc<DecodedOverlay>]> instead of Vec<Arc<DecodedOverlay>> so cloning into the work item each frame is a single ref-count bump instead of a full Vec clone (Fix #8). Fixes: #1, #6, #8 Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

- Fix #1 (High): skip-clear now validates source pixel alpha (all pixels must have alpha==255) before skipping canvas clear. Prevents blending against stale pooled buffer data when RGBA source has transparency. - Fix #2 (Medium): conversion cache slot indices now use position in the full layers slice (with None holes) via two-pass resolution, so cache keys stay stable when slots gain/lose frames. - Fix #3 (Medium): benchmark now calls real composite_frame() kernel instead of reimplementing compositing inline. Exercises all kernel optimizations (cache, clear-skip, identity fast-path, x-map). - Fix Devin Review: revert video pool preallocation (was allocating ~121MB across all bucket sizes at startup). Restored lazy allocation. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

…ark (#68) * perf(compositor): add compositor-only microbenchmark Adds a standalone benchmark that measures composite_frame() in isolation (no VP9 encode, no mux, no async runtime overhead). Scenarios: - 1/2/4 layers RGBA - Mixed I420+RGBA and NV12+RGBA (measures conversion overhead) - Rotation (measures rotated blit path) - Static layers (same Arc each frame, for future cache-hit measurement) Runs at 640x480, 1280x720, 1920x1080 by default. Baseline results on this VM (8 logical CPUs): 1920x1080 1-layer-rgba: ~728 fps (1.37 ms/frame) 1920x1080 2-layer-rgba-pip: ~601 fps (1.66 ms/frame) 1920x1080 2-layer-i420+rgba: ~427 fps (2.34 ms/frame) 1920x1080 2-layer-nv12+rgba: ~478 fps (2.09 ms/frame) 1920x1080 2-layer-rgba-rotated: ~470 fps (2.13 ms/frame) Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply rustfmt to compositor_only benchmark Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): cache YUV→RGBA conversions + skip canvas clear Optimization 1: Add ConversionCache that tracks Arc pointer identity per layer slot. When the source Arc<PooledVideoData> hasn't changed between frames, the cached RGBA data is reused (zero conversion cost). Replaces the old i420_scratch buffer approach. Optimization 2: Skip buf.fill(0) canvas clear when the first visible layer is opaque, unrotated, and fully covers the canvas dimensions. Saves one full-canvas memset per frame in the common case. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): precompute x-map to eliminate per-pixel division Optimization 3: Replace per-pixel `(dx + src_col_skip) * sw / rw` integer division in blit_row_opaque/blit_row_alpha with a single precomputed lookup table (x_map) built once per scale_blit_rgba call. Each destination column now does a table lookup instead of a division, removing O(width * height) divisions per layer per frame. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): add identity-scale fast path for 1:1 opaque blits Optimization 4: When source dimensions match the destination rect, opacity is 1.0, and there's no clipping offset, bypass the x-map lookup entirely. For fully-opaque source rows, use bulk memcpy (copy_from_slice). For rows with semi-transparent pixels, use a simplified per-pixel blend without the scaling indirection. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): pre-scale image overlays at decode time Optimization 5: When a decoded image overlay's native dimensions differ from its target rect, pre-scale it once using nearest-neighbor at config/update time. This ensures the per-frame blit_overlay call hits the identity-scale fast path (memcpy) instead of re-scaling every frame. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): cache layer configs and skip per-frame sort Optimization 6: Extract per-slot layer config resolution and z-order sorting into a rebuild_layer_cache() function that runs only when config or pin set changes (UpdateParams, pin add/remove, channel close). Per-frame layer building now uses the cached resolved configs and pre-sorted draw order instead of doing HashMap lookups and sort_by on every frame. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(frame_pool): preallocate video pool buckets at startup Optimization 7: Change video_default() from with_buckets (lazy, no preallocation) to preallocated_with_max with 2 buffers per bucket. This avoids cold-start allocation misses for the first few frames, matching the existing audio_default() pattern. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style(compositor): fix clippy warnings from optimization changes - Use map_or instead of match/if-let-else in ConversionCache and first_layer_covers_canvas - Allow expect_used with safety comment in get_or_convert - Allow dead_code on LayerSnapshot::z_index (sorting moved upstream) - Allow needless_range_loop in blit_row_opaque/blit_row_alpha (dx used for both x_map index and dst offset) - Allow cast_possible_truncation on idx as i32 in rebuild_layer_cache Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): address correctness + bench issues from review - Fix #1 (High): skip-clear now validates source pixel alpha (all pixels must have alpha==255) before skipping canvas clear. Prevents blending against stale pooled buffer data when RGBA source has transparency. - Fix #2 (Medium): conversion cache slot indices now use position in the full layers slice (with None holes) via two-pass resolution, so cache keys stay stable when slots gain/lose frames. - Fix #3 (Medium): benchmark now calls real composite_frame() kernel instead of reimplementing compositing inline. Exercises all kernel optimizations (cache, clear-skip, identity fast-path, x-map). - Fix Devin Review: revert video pool preallocation (was allocating ~121MB across all bucket sizes at startup). Restored lazy allocation. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply rustfmt to fix formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): SSE2 blend, alpha-scan cache, bench pool, lazy prealloc Fix 4 remaining performance findings: 1. High: Add SSE2 SIMD fast path for RGBA blend loops (blit_row_opaque, blit_row_alpha). Processes 4 pixels at a time with fast-paths for fully-opaque (direct copy) and fully-transparent (skip) source pixels. 2. Medium: Optimize alpha scan in clear-skip check — skip scan entirely for I420/NV12 layers (always alpha=255 after conversion), cache scan result by Arc pointer identity for RGBA layers. 3. Medium: Pass VideoFramePool to bench_composite instead of None, so benchmark exercises pool reuse like production. 4. Low-Medium: Lazy preallocate on first bucket use — when a bucket is first hit, allocate one extra buffer so the second get() is a hit. Also: inline clear-skip logic to fix borrow checker conflict, remove unused first_layer_covers_canvas function, add clippy suppression rationale comments for needless_range_loop. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

Issue #1: Click outside text layer commits inline edit - Add document.activeElement.blur() in handlePaneClick before deselecting - Add useEffect on TextOverlayLayer watching isSelected to commit on deselect Issue #2: Preview panel resizable from all four edges - Add ResizeEdgeRight and ResizeEdgeBottom styled components - Extend handleResizeStart edge type to support right/bottom - Update resizeRef type to match Issue #3: Monitor view preview extracts MoQ peer settings from pipeline - Find transport::moq::peer node in pipeline and extract gateway_path/output_broadcast - Set correct serverUrl and outputBroadcast before connecting - Import updateUrlPath utility Issue #4: Deep-compare layer state to prevent position jumps on selection change - Skip setLayers/setTextOverlays/setImageOverlays when merged state is structurally equal - Prevents stale server-echoed values from causing visual glitches Issue #5: Rotate mouse delta for rotated layer resize handles - Transform (dx, dy) by -rotationDegrees in computeUpdatedLayer - Makes resize handles behave naturally regardless of layer rotation Issue #6: Visual separator between layer list and per-layer controls - Add borderTop and paddingTop to LayerInfoRow for both video and text controls Issue #7: Text layers support opacity and rotation sliders - Add rotationDegrees field to TextOverlayState, parse/serialize rotation_degrees - Add rotation transform to TextOverlayLayer canvas rendering - Replace numeric opacity input with slider matching video layer controls - Add rotation slider for text layers Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(compositor-ui): address 7 UX issues in compositor node Issue #1: Click outside text layer commits inline edit - Add document.activeElement.blur() in handlePaneClick before deselecting - Add useEffect on TextOverlayLayer watching isSelected to commit on deselect Issue #2: Preview panel resizable from all four edges - Add ResizeEdgeRight and ResizeEdgeBottom styled components - Extend handleResizeStart edge type to support right/bottom - Update resizeRef type to match Issue #3: Monitor view preview extracts MoQ peer settings from pipeline - Find transport::moq::peer node in pipeline and extract gateway_path/output_broadcast - Set correct serverUrl and outputBroadcast before connecting - Import updateUrlPath utility Issue #4: Deep-compare layer state to prevent position jumps on selection change - Skip setLayers/setTextOverlays/setImageOverlays when merged state is structurally equal - Prevents stale server-echoed values from causing visual glitches Issue #5: Rotate mouse delta for rotated layer resize handles - Transform (dx, dy) by -rotationDegrees in computeUpdatedLayer - Makes resize handles behave naturally regardless of layer rotation Issue #6: Visual separator between layer list and per-layer controls - Add borderTop and paddingTop to LayerInfoRow for both video and text controls Issue #7: Text layers support opacity and rotation sliders - Add rotationDegrees field to TextOverlayState, parse/serialize rotation_degrees - Add rotation transform to TextOverlayLayer canvas rendering - Replace numeric opacity input with slider matching video layer controls - Add rotation slider for text layers Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): fix preview drag, text state flicker, overlay throttling, multiline text - OutputPreviewPanel: make panel body draggable (not just header) with cursor: grab styling so preview behaves like other canvas nodes - useCompositorLayers: add throttledOverlayCommit for text/image overlay updates (sliders, etc.) to prevent flooding the server on every tick; increase overlay commit guard from 1.5s to 3s to prevent stale params from overwriting local state; arm guard immediately in updateTextOverlay and updateImageOverlay - CompositorCanvas: change InlineTextInput from <input> to <textarea> for multiline text editing; Enter inserts newline, Ctrl/Cmd+Enter commits; add white-space: pre-wrap and word-break to text content rendering; add ResizeHandles to TextOverlayLayer when selected - CompositorNode: change OverlayTextInput to <textarea> with vertical resize support for multiline text in node controls panel Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

- Replace dimension-matching cache heuristic with index-based mapping using image_overlay_cfg_indices (finding #1) - Only update x/y position on cache hit, not full rect clone (finding #2) - Fix MIME sniffing comment wording to 'base64-encoded magic bytes', add BMP detection (finding #3) - Switch from data-URI to URL.createObjectURL with cleanup for image overlay thumbnails (finding #4) - Change SAFETY comment to Invariant in prescale_rgba (finding #7) Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

…, and selectability (#78) * fix(compositor): improve image overlay quality, caching, aspect ratio, and selectability - Replace nearest-neighbor prescaling with bilinear (image crate Triangle filter) for much better rendering of images containing text or fine detail - Cache decoded image overlays across UpdateParams calls — only re-decode when data_base64 or target rect dimensions change, reusing existing Arc<DecodedOverlay> otherwise - Lock aspect ratio for image layers during resize (same as video layers) - Show actual image thumbnail in compositor canvas UI for easier selection; switch border from dotted to solid, remove crosshatch pattern Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): guard against index mismatch in image overlay cache Use old_imgs.get(i) instead of old_imgs[i] to avoid a panic when a previous decode_image_overlay call failed, leaving old_imgs shorter than old_cfgs. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): address review — proper index mapping for cache, broader MIME detection - Build a HashMap<usize, &Arc<DecodedOverlay>> by walking old configs and decoded overlays in tandem, so cache lookups use config index rather than assuming positional alignment (which breaks when a previous decode failed) - Add WebP and GIF magic-byte detection for image thumbnail data URIs Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style(compositor): apply cargo fmt formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): fix HashMap type and double-deref in overlay cache Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): content-keyed overlay cache with dimension-based matching Replace incorrect positional index mapping with a content-keyed cache that matches decoded overlays to configs by comparing prescaled bitmap dimensions against the config's target rect. This correctly handles the case where a mid-list decode failure makes the decoded slice shorter than the config vec — failed configs are skipped (not consumed) because their target dimensions won't match the next decoded overlay. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): default image overlay z-index to 200 so it renders above video layers Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style(compositor): add rationale comment for clippy::expect_used suppression Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply formatting fixes (cargo fmt + prettier) Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): address review findings #1-#4, #7 - Replace dimension-matching cache heuristic with index-based mapping using image_overlay_cfg_indices (finding #1) - Only update x/y position on cache hit, not full rect clone (finding #2) - Fix MIME sniffing comment wording to 'base64-encoded magic bytes', add BMP detection (finding #3) - Switch from data-URI to URL.createObjectURL with cleanup for image overlay thumbnails (finding #4) - Change SAFETY comment to Invariant in prescale_rgba (finding #7) Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): preserve image aspect ratio, add image layer controls, optimize base64 decode - Backend: prescale images with aspect-ratio preservation (scale-to-fit instead of stretch-to-fill) and centre within the target rect. - Backend: re-centre cached overlays on position update. - Frontend: detect natural image dimensions on add and set initial rect to match source aspect ratio. - Frontend: add opacity/rotation slider controls for selected image overlays (matching video and text layer controls). - Frontend: fix findAnyLayer to pass through rotationDegrees and zIndex for image overlays instead of hardcoding 0. - Frontend: replace O(n) atob + byte-by-byte loop with fetch(data-URI) for more efficient base64-to-blob conversion. - Frontend: remove BMP MIME detection (inconsistent browser support). - Frontend: add z-index band allocation comments (video 0-99, text 100-199, image 200+). Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): apply rotation transform to image overlay layer in canvas preview Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): include rotationDegrees and zIndex in overlay sync change detection Add rotationDegrees and zIndex to the image overlay change-detection comparisons in the params sync effect so that YAML or backend changes to these fields are reflected in the UI. Also add the missing zIndex check to the text overlay change detection for consistency. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

Implements all 13 actionable findings from the video feature review (finding #11 skipped — would require core PixelFormat serde changes): WebM muxer (webm.rs): - Add shutdown/cancellation handling to the receive loop via tokio::select! on context.control_rx, matching the pattern used by the OGG muxer and colorbars node (fix #1, important) - Remove dead chunk_size config field and DEFAULT_CHUNK_SIZE constant; update test that referenced it (fix #2, important) - Make Seek on Live MuxBuffer return io::Error(Unsupported) instead of warn-and-clamp to fail fast on unexpected seek calls (fix #3, important) - Add comment noting VP9 CodecPrivate constants must stay in sync with encoder config in video/mod.rs (fix #4, important) - Make OpusHead pre_skip configurable via WebMMuxerConfig::opus_preskip_samples instead of always using the hardcoded constant (fix #6, minor) - Group mux_frame loose parameters into MuxState struct (fix #12, nit) - Fix BitReader::read() doc comment range 1..=16 → 1..=32 (fix #14, nit) VP9 codec (vp9.rs): - Add startup-time ABI assertion verifying vpx_codec_vp9_cx/dx return non-null VP9 interfaces (fix #5, minor) Colorbars (colorbars.rs): - Add draw_time_use_pts config option to stamp PTS instead of wall-clock time, more useful for A/V timing debugging (fix #7, minor) - Document studio-range assumption in SMPTE bar YUV table comment with note explaining why white Y=180 (fix #13, nit) OGG muxer (ogg.rs): - Remove dead is_first_packet field and its no-op toggle (fix #10, minor) Tests (tests.rs): - Add File mode (WebMStreamingMode::File) test exercising the seekable temp-file code path (fix #8, minor) - Add edge-case tests: non-keyframe first video packet and truncated/ corrupt VP9 header — verify no panics (fix #9, minor) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Signed-off-by: bot_apk <apk@cognition.ai> Co-Authored-By: Staging-Devin AI <166158716+staging-devin-ai-integration[bot]@users.noreply.github.com>

Implements all 13 actionable findings from the video feature review (finding #11 skipped — would require core PixelFormat serde changes): WebM muxer (webm.rs): - Add shutdown/cancellation handling to the receive loop via tokio::select! on context.control_rx, matching the pattern used by the OGG muxer and colorbars node (fix #1, important) - Remove dead chunk_size config field and DEFAULT_CHUNK_SIZE constant; update test that referenced it (fix #2, important) - Make Seek on Live MuxBuffer return io::Error(Unsupported) instead of warn-and-clamp to fail fast on unexpected seek calls (fix #3, important) - Add comment noting VP9 CodecPrivate constants must stay in sync with encoder config in video/mod.rs (fix #4, important) - Make OpusHead pre_skip configurable via WebMMuxerConfig::opus_preskip_samples instead of always using the hardcoded constant (fix #6, minor) - Group mux_frame loose parameters into MuxState struct (fix #12, nit) - Fix BitReader::read() doc comment range 1..=16 → 1..=32 (fix #14, nit) VP9 codec (vp9.rs): - Add startup-time ABI assertion verifying vpx_codec_vp9_cx/dx return non-null VP9 interfaces (fix #5, minor) Colorbars (colorbars.rs): - Add draw_time_use_pts config option to stamp PTS instead of wall-clock time, more useful for A/V timing debugging (fix #7, minor) - Document studio-range assumption in SMPTE bar YUV table comment with note explaining why white Y=180 (fix #13, nit) OGG muxer (ogg.rs): - Remove dead is_first_packet field and its no-op toggle (fix #10, minor) Tests (tests.rs): - Add File mode (WebMStreamingMode::File) test exercising the seekable temp-file code path (fix #8, minor) - Add edge-case tests: non-keyframe first video packet and truncated/ corrupt VP9 header — verify no panics (fix #9, minor) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Signed-off-by: bot_apk <apk@cognition.ai> Co-authored-by: bot_apk <apk@cognition.ai> Co-authored-by: Staging-Devin AI <166158716+staging-devin-ai-integration[bot]@users.noreply.github.com>

* chore: update roadmap * feat(video): update packet types, docs, and compatibility rules * feat(video): make raw video layout explicit + enforce aligned buffers * feat(webm): extend muxer with VP9 video track support (PR4) - Add dual input pins: 'audio' (Opus) and 'video' (VP9), both optional - Add video track via VideoCodecId::VP9 with configurable width/height - Multiplex audio and video frames using tokio::select! in receive loop - Track monotonic timestamps across tracks (clamp to last_written_ns) - Convert timestamps from microseconds to nanoseconds for webm crate - Dynamic content-type: video/webm;codecs="vp9,opus" | vp9 | opus - Extract flush logic into flush_output() helper - Add video_width/video_height to WebMMuxerConfig - Add MuxTracks struct and webm_content_type() const helper - Update node registration description - Add test: VP9 video-only encode->mux produces parseable WebM - Add test: no-inputs-connected returns error - Update existing tests to use new 'audio' pin name Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat: end-to-end video pipeline support - YAML compiler: add Needs::Map variant for named pin targeting - Color Bars Generator: SMPTE I420 source node (video::colorbars) - MoQ Peer: video input pin, catalog with VP9, track publishing - Frontend: generalize MSEPlayer for audio/video, ConvertView video support - Frontend: MoQ video playback via Hang Video.Renderer in StreamView - Sample pipelines: oneshot (color bars -> VP9 -> WebM) and dynamic (MoQ stream) Signed-off-by: Devin AI <devin@cognition.ai> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): video-aware ConvertView for no-input pipelines - Detect pipelines without http_input as no-input (hides upload UI) - Add checkIfVideoPipeline helper for video pipeline detection - Update output mode label: 'Play Video' for video pipelines - Derive isVideoPipeline from pipeline YAML via useMemo Signed-off-by: Devin AI <devin@cognition.ai> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(server): allow generator-only oneshot pipelines without http_input Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(engine): allow generator-only oneshot pipelines without file_reader Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(nodes): enable video feature (vp9 + colorbars) in default features Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: generator pipeline start signals, video-only content-type, and media-generic UI messages Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix * feat: add sweep bar animation to colorbars, skip publish for receive-only pipelines - ColorBarsNode now draws a 4px bright-white vertical bar that sweeps across the frame at 4px/frame, making motion clearly visible. - extractMoqPeerSettings returns hasInputBroadcast so the UI can infer whether a pipeline expects a publisher. - handleTemplateSelect auto-sets enablePublish=false for receive-only pipelines (no input_broadcast), skipping microphone access. - decideConnect respects enablePublish in session mode instead of always forcing shouldPublish=true. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(vp9): configurable encoder deadline (default realtime), avoid unnecessary metadata clones - Add Vp9EncoderDeadline enum (realtime/good_quality/best_quality) to Vp9EncoderConfig, defaulting to Realtime instead of the previous hard-coded VPX_DL_BEST_QUALITY. - Store deadline in Vp9Encoder struct and use it in encode_frame/flush. - Encoder input task: use .take() instead of .clone() on frame metadata since the frame is moved into the channel anyway. - Decoder decode_packet: peek ahead and only clone metadata when multiple frames are produced; move it on the last iteration. - Encoder drain_packets: same peek-ahead pattern to avoid cloning metadata on the last (typically only) output packet. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: cargo fmt Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * test(e2e): add video pipeline tests for convert and MoQ stream views - Add verifyVideoPlayback helper for MSEPlayer video element verification - Add verifyCanvasRendering helper for canvas-based video frame verification - Add convert view test: select video colorbars template, generate, verify video player - Add stream view test: create MoQ video session, connect, verify canvas rendering Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: correct webm_muxer pin name in mixing pipeline and convert button text in asset mode - mixing.yml: use 'audio' input pin for webm_muxer instead of default 'in' pin - ConvertView: show 'Convert File' button text when in asset mode (not 'Generate') - test-helpers: fix prettier formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(webm-muxer): generic input pins with runtime media type detection Replace fixed 'audio'/'video' pin names with generic 'in'/'in_1' pins that accept both EncodedAudio(Opus) and EncodedVideo(VP9). The actual media type is detected at runtime by inspecting the first packet's content_type field (video/* → video track, everything else → audio). This makes the muxer future-proof for additional track types (subtitles, data channels, etc.) without requiring pin-name changes. Pin layout is config-driven: - Default (no video dimensions): single 'in' pin — fully backward compatible with existing audio-only pipelines. - With video_width/video_height > 0: two pins 'in' + 'in_1'. Updated all affected sample pipelines and documentation. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: cargo fmt Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(webm-muxer): connection-time type detection via NodeContext.input_types Replace packet probing with connection-time media type detection. The graph builder now populates NodeContext.input_types with the upstream output's PacketType for each connected pin, so the webm muxer can classify inputs as audio or video without inspecting any packets. Changes: - Add input_types: HashMap<String, PacketType> to NodeContext - Populate input_types in graph_builder (oneshot pipelines) - Leave empty in dynamic_actor (connections happen after spawn) - Refactor WebMMuxerNode::run() to use input_types instead of probing - Remove first-packet buffering logic from receive loop - Update all NodeContext constructions in test code - Update docs to reflect connection-time detection Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(compositor): add video compositor node with dynamic inputs, overlays, and spawn_blocking Implements the video::compositor node (PR3 from VIDEO_SUPPORT_PLAN.md): - Dynamic input pins (PinCardinality::Dynamic) for attaching arbitrary raw video inputs at runtime - RGBA8 output canvas with configurable dimensions (default 1280x720) - Image overlays: decoded once at init via the `image` crate (PNG/JPEG) - Text overlays: rasterized once per UpdateParams via `tiny-skia` - Compositing runs in spawn_blocking to avoid blocking the async runtime - Nearest-neighbor scaling for MVP (bilinear/GPU follow-up) - Per-layer opacity and rect positioning - NodeControlMessage::UpdateParams support for live parameter tuning - Pool-based buffer allocation via VideoFramePool - Metadata propagation (timestamp, duration, sequence) from first input New dependencies: - image 0.25.9 (MIT/Apache-2.0) — PNG/JPEG decoding, features: png, jpeg - tiny-skia 0.12.0 (BSD-3-Clause) — 2D rendering, pure Rust - base64 0.22 (MIT/Apache-2.0) — base64 decoding for image overlay data 14 tests covering compositing helpers, config validation, node integration, metadata preservation, and pool usage. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: cargo fmt Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): address review findings and add sample pipeline - Fix shutdown propagation: add should_stop flag so Shutdown in the non-blocking try_recv loop properly breaks the outer loop instead of falling through to an extra composite pass. - Fix canvas resize: remove stale canvas_w/canvas_h locals captured once at init; read self.config.width/height directly so UpdateParams dimension changes take effect immediately. - Fix image overlay re-decode: always re-decode image overlays on UpdateParams, not only when the count changes (content/rect/opacity changes were silently ignored). - Add video_compositor_demo.yml oneshot sample pipeline: colorbars → compositor (with text overlay) → VP9 → WebM → HTTP output. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): use single needs variant in sample pipeline YAML Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): remove deeply nested params from sample YAML serde_saphyr cannot deserialize YAML with 4+ nesting levels inside params when the top-level type is an untagged enum (UserPipeline). Text/image overlays with nested rect objects trigger this limitation. Removed text_overlays from the static sample YAML. Overlays can still be configured at runtime via UpdateParams (JSON, not serde_saphyr). Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): add num_inputs for static pin pre-creation in oneshot pipelines Mirrors the AudioMixerNode pattern: when num_inputs is set in params, pre-create input pins so the graph builder can wire connections at startup. Single input uses pin name 'in' (matching YAML convention), multiple inputs use 'in_0', 'in_1', etc. The sample pipeline now sets num_inputs: 1 so the compositor declares the 'in' pin that the graph builder expects. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(compositor): accept I420 inputs and configurable output format - Colorbars node: add pixel_format config (i420 default, rgba8 supported) with RGBA8 generation + sweep bar functions - Compositor: accept both I420 and RGBA8 inputs (auto-converts I420 to RGBA8 internally for compositing via BT.601 conversion) - Compositor: add output_pixel_format config (rgba8 default, i420 for VP9 encoder compatibility) with RGBA8→I420 output conversion - Sample pipeline: uses I420 colorbars → compositor (output_pixel_format: i420) → VP9 encoder → WebM muxer → HTTP output Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): process every frame instead of draining to latest The non-blocking try_recv loop was draining all queued frames and keeping only the latest per slot. When spawn_blocking compositing was slower than the producer (colorbars at 90 frames), intermediate frames were dropped, resulting in only 2 output frames. Changed to take at most one frame per slot per loop iteration so every produced frame is composited and forwarded downstream. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(compositor): auto-PiP positioning and two-input sample pipeline - Non-first layers without explicit layers config are auto-positioned as PiP windows (bottom-right corner, 1/3 canvas size, 0.9 opacity) - Sample pipeline now uses two colorbars sources: 640x480 I420 background + 320x240 RGBA8 PiP overlay, making compositing visually obvious Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): move all pixel format conversions into spawn_blocking Previously I420→RGBA8 (input) and RGBA8→I420 (output) conversions ran on the async runtime, blocking it for ~307K pixel iterations per frame per input. Now all conversions run inside the spawn_blocking task alongside compositing, keeping the async runtime free for channel ops. - Removed ensure_rgba8() calls from frame receive paths - Store raw frames (I420 or RGBA8) in InputSlot.latest_frame - Added pixel_format field to LayerSnapshot - composite_frame() converts I420→RGBA8 on-the-fly per layer - RGBA8→I420 output conversion also runs inside spawn_blocking Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): parallelize with rayon and use persistent blocking thread - Add rayon as optional dependency gated on compositor feature - Parallelize scale_blit_rgba() across rows using rayon::par_chunks_mut - Split blit into blit_row_opaque (no alpha multiply) and blit_row_alpha - Parallelize i420_to_rgba8() and rgba8_to_i420() row processing - Replace per-frame spawn_blocking with persistent blocking thread via channels - Add CompositeWorkItem/CompositeResult types for channel communication Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(compositor): modularize into config, overlay, pixel_ops, and kernel sub-modules Split the 1700+ line compositor.rs into focused sub-modules: - config.rs: configuration types, validation, pixel format parsing - overlay.rs: DecodedOverlay, image decoding, text rasterization - pixel_ops.rs: scale_blit_rgba, blit_row*, blit_overlay, i420/rgba8 conversion - kernel.rs: LayerSnapshot, CompositeWorkItem/Result, composite_frame - mod.rs: CompositorNode, run loop, registration, tests Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): 5 high-impact video compositing optimizations 1. Pool intermediate color conversion buffers: i420_to_rgba8_buf and rgba8_to_i420_buf write into caller-provided buffers instead of allocating fresh Vec's every frame (~34 MB/s allocation churn eliminated). Persistent scratch buffers are reused across frames in the compositing thread. 2. I420 pass-through: when a single I420 layer fills the full canvas with no overlays and output is I420, skip the entire I420→RGBA8→I420 round-trip. 3. Vectorize inner loops: process 4 pixels at a time in color conversion loops with hoisted row bases to help LLVM auto-vectorize. 4. Arc overlays: wrap DecodedOverlay in Arc so per-frame clones into the CompositeWorkItem are cheap reference-count bumps instead of deep copies. 5. Integer-only alpha blending: replace f32 blend math in blit_row_opaque and blit_row_alpha with fixed-point integer arithmetic using the ((val + (val >> 8)) >> 8) fast approximation of division by 255. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): fix regression — replace broken chunking with slice iterators The previous 4-pixel chunking approach (for chunk in 0..chunks { for i in 0..4 }) added MORE Range::next overhead instead of helping vectorization. Fixes: - i420_to_rgba8_buf: use chunks_exact_mut(4) on output + sub-sliced input planes to eliminate Range::next calls AND bounds checks entirely - rgba8_to_i420_buf Y plane: use chunks_exact(4) on input RGBA row with enumerate() instead of range-based indexing - I420 passthrough: return layer index instead of Arc, copy data into pooled buffer directly (Arc::try_unwrap always failed since the original frame still holds a ref, causing a wasteful .to_vec()) Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): revert chunks_exact to simple for-loops chunks_exact(4).enumerate() added MORE overhead than Range::next: - ChunksExact::next -> split_at_checked -> split_at_unchecked -> from_raw_parts chain consumed ~33% CPU vs original ~14% from Range::next. - Enumerate::next alone was 15.33% of total CPU. Revert to simple 'for col in 0..w' with pre-computed row bases. The buffer pooling (optimization #1) is confirmed working well via DHAT: ~1GB alloc churn eliminated. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): eliminate double-copy in I420 output path Write rgba8_to_i420_buf directly into the pooled output buffer instead of going through an intermediate scratch buffer + copy_from_slice. This removes a full extra memcpy of the I420 data every frame. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * bench: add compositor pipeline benchmark for profiling Adds a standalone benchmark binary that runs the compositing oneshot pipeline (colorbars → compositor → vp9 → webm → http_output) and reports wall-clock time, throughput (fps), per-frame latency, and output bytes. Supports CLI args for profiling flexibility: --width, --height, --fps, --frames, --iterations Usage: cargo bench -p streamkit-engine --bench compositor_pipeline cargo bench -p streamkit-engine --bench compositor_pipeline -- --frames 300 --width 1280 --height 720 Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: resolve clippy lint errors in video nodes Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: resolve remaining clippy lint errors in video nodes Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: make lint pass after metadata updates Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * chore: update native plugin lockfiles Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(webm): skip intermediate flushes in File mode to prevent finalize failure In File mode, the SharedPacketBuffer was being drained during the mux loop via flush_output(). When segment.finalize() subsequently tried to seek backward to backpatch the EBML header (duration, cues), those bytes had already been moved out of the buffer, causing finalize to fail. Fix: guard flush_output calls with an is_file_mode flag so the entire buffer remains intact until finalize() completes. The post-finalize flush already handles emitting the complete finalized bytes. Also adds libvpx-dev to the CI runner's apt packages (lint, test, build jobs) so the vp9 feature compiles on GitHub Actions. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(webm): use Live mode for VP9 mux test to avoid unbounded memory The previous fix kept the entire WebM buffer in memory during File mode to allow finalize() backward seeks. This would cause unbounded memory growth for long streams. Instead, switch the test to Live mode (the default and intended streaming use case). Live mode uses a non-seek writer with zero-copy streaming drain, keeping memory bounded. The test assertions (EBML header, content type) don't require File mode. Reverts the is_file_mode flush guard from the previous commit since it's no longer needed. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): handle non-video packets and single channel close in recv_from_any_slot Introduces SlotRecvResult enum with Frame/ChannelClosed/NonVideo/Empty variants. The main loop now removes closed slots and skips non-video packets instead of treating any single channel close as all-inputs-closed. Also adds a comment about dropped in-flight results on shutdown (Fix #6). Optimizes overlay cloning by using Arc<[Arc<DecodedOverlay>]> instead of Vec<Arc<DecodedOverlay>> so cloning into the work item each frame is a single ref-count bump instead of a full Vec clone (Fix #8). Fixes: #1, #6, #8 Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(webm): restore streaming-mode guard in flush_output Pass streaming_mode into flush_output and skip all intermediate flushes in File mode. In File mode the writer supports seeking and may back-patch segment sizes/cues, so draining the buffer after every frame would send stale bytes that get overwritten later, corrupting the output. Fix #2 Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(moq): remove hardcoded catalog dimensions and add clean shutdown Thread video_width and video_height from MoqPeerConfig through to create_and_publish_catalog instead of hardcoding 640x480. Add fields to BidirectionalTaskConfig so the bidirectional path also gets the correct dimensions. Add clean shutdown when both audio and video pipeline inputs close: each input branch now explicitly handles None (channel closed), sets its rx to None, and breaks when both are done. Fixes #3, #4 Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(vp9): improve encoder/decoder allocations and add shutdown comments - Change next_pts duration default from 0 to 1 so libvpx rate-control always sees a non-zero duration (Fix #5). - Add comment about data loss on explicit encoder shutdown (Fix #7). - Use Bytes::copy_from_slice in drain_packets instead of .to_vec() + Bytes::from(), avoiding an intermediate Vec allocation per encoded packet (Fix #9). - Use Vec::with_capacity(1) in decode_packet since most VP9 packets produce exactly one frame, avoiding a heap alloc in the common case (Fix #10). Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(video): extract shared parse_pixel_format utility Move the duplicated parse_pixel_format function from colorbars.rs and compositor/config.rs into video/mod.rs as a shared utility. Both modules now re-export it from the parent module. Also includes cargo fmt formatting fixes from the previous commits. Fix #11 Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: sweep bar clipping, WebM auto-detect dims, output filename - colorbars: clip sweep bar at frame edge instead of wrapping via modulo, preventing the bar from appearing split across PiP boundaries - webm: auto-detect video dimensions from first VP9 keyframe when video_width/video_height are not configured (both 0). Parses the VP9 uncompressed header to extract width/height, buffers the first packet, and replays it after segment creation. This eliminates the need to manually keep muxer dimensions in sync with the upstream encoder. - ui: change download filename from 'converted_audio_converted.webm' to 'output.[ext]' when no source file is available; keep the '{name}_converted' pattern only when a real input file exists Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt to webm muxer Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf: collapse SharedPacketBuffer mutexes, bump pool max, zero-alloc compositor poll - Collapse triple-mutex SharedPacketBuffer into single Mutex<BufferState> to eliminate lock-ordering risk between cursor, last_sent_pos, and base_offset. - Bump DEFAULT_VIDEO_MAX_BUFFERS_PER_BUCKET from 8 to 16 to reduce pool misses in deep pipelines (colorbars → compositor → encoder → muxer → transport can easily have 8+ frames in flight). - Replace select_all + Vec<Box<Pin<Future>>> in compositor recv_from_any_slot with zero-allocation poll_fn that calls poll_recv directly on each slot receiver. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(sample): add pacer node to video compositor demo for real-time playback Without the pacer, colorbars in batch mode (frame_count > 0) generates all frames as fast as possible with no real-time pacing. The WebM muxer flushes each frame immediately in live mode, flooding the http_output with the entire stream faster than real-time, causing browsers to buffer heavily. Insert core::pacer between webm_muxer and http_output to release muxed chunks at the rate indicated by their duration_us metadata (~33ms per frame at 30fps), matching real-time playback expectations. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(engine): walk connection graph backwards for content-type resolution When passthrough-style nodes (core::pacer, core::passthrough, core::telemetry_tap, etc.) are inserted between the content-producing node and http_output, the oneshot runner previously only checked the immediate predecessor of http_output for content_type(). Since those utility nodes return None, the response fell back to application/octet-stream, causing browsers to misdetect the stream. Now the runner walks backwards through the connection graph until it finds a node that declares a content_type, so inserting any number of passthrough nodes before http_output preserves the correct MIME. Also suppresses clippy::significant_drop_tightening on the SharedPacketBuffer methods where the mutex guard intentionally spans the entire take-trim-update / seek-compute sequence. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): sort input slots by pin name for deterministic layer ordering HashMap::drain() has non-deterministic iteration order, so the compositor slots could randomly swap which input becomes the background (idx 0) vs. the PiP overlay (idx > 0). This caused two user-visible issues: 1. Background/PiP resolution swap: the 1280×720 colorbars sometimes ended up in the PiP slot and the 320×240 in the background slot. 2. Sweep bar appearing to extend beyond PiP boundaries: a consequence of the resolution swap — the large-resolution sweep bar interacts visually with the small-resolution background at the PiP boundary. Fix: sort the drained inputs numerically by their 'in_N' pin suffix before populating the slots Vec, so in_0 always comes before in_1. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(compositor): add z_index to LayerConfig for explicit layer stacking order Adds a z_index field (i32, default 0) to LayerConfig and LayerSnapshot. Layers are sorted by z_index before compositing — lower values are drawn first (bottom of the stack). Ties are broken by the original slot order. Auto-PiP layers without explicit config get z_index = slot index (so background = 0, first PiP = 1, etc.). Explicit LayerConfig entries can override this to reorder layers at will, including via UpdateParams at runtime. This decouples visual stacking order from pin connection order, which is the correct separation of concerns for a compositor. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt to compositor z_index changes Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf: review fixes — temp file for WebM File mode, Arc unwrap, rayon threshold, saturating sub, config struct - Fix #6: Use saturating_sub for MoQ Peer subscriber count to prevent underflow - Fix #11: Skip memcpy in I420 passthrough when Arc has sole ownership (try_unwrap) - Fix #12: Add minimum-row threshold for rayon parallel pixel ops (skip dispatch for small canvases) - Fix #19: WebM File mode uses on-disk temp file (FileBackedBuffer) instead of unbounded in-memory Vec - Fix #24: Group subscriber params into SubscriberMediaConfig struct, reducing argument counts - Add MuxBuffer enum to unify Live (SharedPacketBuffer) and File (FileBackedBuffer) buffer types - Add tempfile to webm feature gate in Cargo.toml Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(compositor): sweep_bar toggle, fontdue text rendering, rotation, signed coords - Add sweep_bar bool to ColorBarsConfig (default true) to gate the animated vertical bar; set false on background to prevent visual bleed through PiP overlays. - Replace placeholder rectangle glyphs with real font rendering via fontdue 0.9. Supports font_path, font_data_base64, and falls back to system DejaVu Sans. Coverage-based alpha-over compositing. - Change Rect.x/y from u32 to i32 for signed (off-screen) positioning. scale_blit_rgba now clips negative source offsets correctly. - Add rotation_degrees (f32, clockwise) to LayerConfig/LayerSnapshot. New scale_blit_rgba_rotated() uses inverse-affine mapping with nearest-neighbor sampling over the axis-aligned bounding box. - Update oneshot demo YAML: sweep_bar false on background, explicit layer config with PiP rect at (380,220) 240x180 rotated 15 degrees. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(demo): add text overlay layer with bundled DejaVu Sans font Add a third layer to the compositor demo: a 'StreamKit Demo' text overlay rendered with fontdue using the bundled DejaVu Sans font. - Bundle DejaVu Sans TTF in assets/fonts/ with its Bitstream Vera license file. - Update demo YAML to include text_overlays with font_path pointing to the bundled font, white text at (20,20) 32px. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: work around serde_saphyr untagged enum limitation for nested YAML serde_saphyr fails to deserialize deeply nested structures (sequences of objects with nested objects, maps with nested objects) when they appear inside #[serde(untagged)] enums. Add parse_yaml() helper to streamkit_api::yaml that uses a two-step approach: YAML -> serde_json::Value -> UserPipeline. This bypasses the serde_saphyr limitation by using serde_json's deserializer for the untagged enum dispatch. Update all three call sites that directly deserialized YAML into UserPipeline: - samples.rs: parse_pipeline_metadata() - server.rs: create_session_handler() - server.rs: parse_config_field() Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt to server.rs Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(demo): move PiP overlay positioning to the left Move the PiP overlay x-coordinate from 380 to 100 so the main canvas blue bar (rightmost SMPTE bar) remains clearly visible and is not obscured by the overlapping PiP layer. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(colorbars): remove sweep_bar parameter entirely Remove the sweep_bar config field, its default function, and both draw_sweep_bar_i420/draw_sweep_bar_rgba8 rendering functions. Also remove the sweep_bar: false reference from the compositor demo YAML. The sweep bar feature is being simplified out for now. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(compositor): add draw_time option with millisecond precision When draw_time is true the compositor renders the current wall-clock time (HH:MM:SS.mmm) in the bottom-left corner of every composited frame using a pre-loaded monospace font (DejaVu Sans Mono). - Add draw_time and draw_time_font_path fields to CompositorConfig - Add load_font_from_path() and rasterize_text_with_font() to overlay - Pre-load font once during init; rasterize per frame in the main loop - Pull DejaVu Sans Mono (royalty-free) into assets/fonts/ - Enable draw_time in the demo pipeline YAML Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt to draw_time changes Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): add edge anti-aliasing for rotated layers Replace the hard binary contains() inside/outside test in scale_blit_rgba_rotated() with a signed-distance-to-edge approach. For each destination pixel the signed distance to all four edges of the un-rotated rectangle is computed. Pixels well inside (dist >= 1) get full alpha; edge pixels (0 < dist < 1) get fractional coverage proportional to the distance; pixels outside (dist <= 0) are skipped. This smooths the staircase zig-zag artifacts on rotated overlay borders. The bounding box is also expanded by 1px on each side to include the anti-aliased fringe. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(colorbars): move draw_time from compositor to colorbars generator The draw_time feature belongs in the source frame generator (ColorBarsNode), not the composition layer, consistent with how sweep_bar was previously implemented. - Add draw_time + draw_time_font_path fields to ColorBarsConfig - Implement per-frame wall-clock stamping (HH:MM:SS.mmm) in ColorBarsNode using fontdue, supporting both RGBA8 and I420 pixel formats - Remove draw_time logic from CompositorConfig/CompositorNode entirely - Remove unused load_font_from_path and rasterize_text_with_font from overlay - Add fontdue dependency to the colorbars feature - Update demo YAML to configure draw_time on colorbars_bg node Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor: deduplicate and improve video subsystem code quality - Extract shared mux_frame() helper in webm.rs (~120 lines reduced) - Extract generic codec_forward_loop() for VP9 encoder/decoder (~300 lines) - Extract shared blit_text_rgba() utility in video/mod.rs - Parallelize rotated blit with rayon (row-level, RAYON_ROW_THRESHOLD) - Document packed layout assumption in pixel format conversions - Share DEFAULT_VIDEO_FRAME_DURATION_US constant (webm + moq peer) - Share accepted_video_types() in compositor (definition_pins + make_input_pin) Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(ui): add compositor node UI with draggable layer canvas Add visual compositor node UI that allows users to manipulate compositor layers on a scaled canvas. Features include: - Draggable, resizable layer boxes with position/size handles - Opacity, rotation, and z-index sliders per selected layer - Zero-render drag via refs + requestAnimationFrame for smooth UX - Full config updates via new tuneNodeConfig callback - Staging mode support (batch changes or live updates) - LIVE indicator matching AudioGainNode pattern New files: - useCompositorLayers.ts: Hook for layer state management - CompositorCanvas.tsx: Visual canvas component - CompositorNode.tsx: ReactFlow node component Modified files: - useSession.ts: Add tuneNodeConfig for full-config updates - reactFlowDefaults.ts: Register compositor node type - FlowCanvas.tsx: Add compositor to nodeTypes type - MonitorView.tsx: Map video::compositor kind, thread onConfigChange - DesignView.tsx: Map video::compositor kind with defaults Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): collapse unscaled height in compositor canvas via negative margin CSS transform: scale() does not affect the layout box, causing the outer container to reserve the full unscaled height (e.g. 720px). Add marginBottom: canvasHeight * (scale - 1) to collapse the extra space so the compositor node fits tightly in the ReactFlow canvas. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): map video::compositor type in YAML pipeline parser The YAML parser hardcoded all non-gain nodes to 'configurable' type, so compositor nodes imported via YAML would not get the custom CompositorNode UI. Add the same kind-to-type mapping used in DesignView and MonitorView. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): enable compositor layer interactions in Design View - Wire up onParamChange in useCompositorLayers so layers are interactive when editing pipelines in Design View (not just live sessions) - Trigger YAML regeneration on param changes with feedback loop guard - Defer YAML regeneration via queueMicrotask to avoid React setState during render warning Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: format useCompositorLayers.ts Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat: add Video Compositor (MoQ Stream) pipeline template Adds a sample dynamic pipeline that composites two colorbars sources through the compositor node and streams the result via MoQ (WebTransport). Pipeline chain: colorbars_bg + colorbars_pip → compositor (2 inputs) → VP9 encoder → MoQ peer (output broadcast). Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(ui): Complete compositor UX improvements - Fix YAML pipeline loading: infer compositor output_pixel_format (I420/Rgba8) - Fix wildcard null matching in canConnectPair for dimension compatibility - Fix map-style needs parsing in YAML pipeline loader ({pin: node} format) - Replace Z-index slider with numeric input + bring forward/backward buttons - Add text overlay management UI (add/remove with default params) - Add image overlay management UI integrated with asset upload system - Add collapsible Output Preview panel in Monitor View Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): prevent compositor node overlap in auto-layout Add estimated height (500px) for video::compositor node kind to prevent overlapping with downstream nodes during auto-layout positioning. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(ui): compositor UX improvements - layer rendering, floating preview, YAML highlighting - Render text overlays with actual text content and scaled font in compositor canvas - Render image overlays as distinct colored rectangles with icon badge - Apply golden-angle hue spacing for visual layer distinction - Add layer name overlay and dimension labels on each layer - Add per-layer controls: opacity slider, rotation slider, z-index with stack buttons - Replace title tooltips with SKTooltip in overlay remove buttons - Add useCompositorSelection hook for cross-component layer selection sync - Highlight selected compositor layer's YAML range in YamlPane - Redesign output preview from bottom-docked panel to floating draggable window - Style numeric inputs with design system tokens (borders, focus ring, hidden spinners) - Fix ESLint import ordering and unused variable warnings Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor,vp9): eliminate format bounce and add SSE2 SIMD (#62) * perf(compositor,vp9): eliminate format bounce and add SSE2 SIMD - Compositor now always outputs RGBA8, removing the per-frame rgba8_to_i420_buf call from the compositing thread (~24% CPU). - VP9 encoder accepts both RGBA8 and I420 inputs; when receiving RGBA8 it converts to I420 on its own blocking thread, pipelining the conversion with the compositor's next frame. - Added SSE2 SIMD paths for i420_to_rgba8_buf and rgba8_to_i420_buf (Y-plane and chroma subsampling), processing 8 pixels per iteration with scalar fallback for tail pixels and non-x86 targets. - Removed try_i420_passthrough optimisation (no longer needed since the compositor always works in RGBA8). - Simplified CompositeResult to a single rgba_data field. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): fix i16 overflow in SIMD color conversions, use i32 arithmetic Both i420_to_rgba8_row_sse2 and rgba8_to_y_row_sse2 now use 32-bit arithmetic throughout to avoid silent truncation when BT.601 coefficients (298, 409, 516, 129) are multiplied by pixel values (0-255). The products can reach ~131,580, well beyond i16::MAX (32,767). Changes: - i420_to_rgba8_row_sse2: process 4 pixels/iter in i32 (was 8 in i16) - rgba8_to_y_row_sse2: process 4 pixels/iter in i32 (was 8 in i16) - New mul32_sse2 helper: SSE2-compatible i32 multiply via _mm_mul_epu32 with even/odd lane shuffling - Add 3 equivalence tests: SIMD-vs-scalar for both directions + roundtrip Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): fix chroma averaging bug and remove stale output_pixel_format - rgba8_to_chroma_row_sse2: simplified horizontal pair extraction to _mm_packs_epi32(r_sum, zero) instead of complex mask-shift-pack that dropped every other 2x2 chroma block (causing visible vertical banding) - Removed stale output_pixel_format: i420 from video_compositor_demo.yml and compositor benchmark (now silently ignored, always outputs RGBA8) - Removed unused imports (_mm_srli_si128, _mm_set_epi32) from chroma fn Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply cargo fmt to chroma averaging fix Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * feat: NV12 as default video format (#63) * feat: add NV12 as default video format - Add PixelFormat::Nv12 variant to core type system with VideoLayout plane math for 2-plane NV12 (Y + interleaved UV) - Update parse_pixel_format to accept 'nv12' format string - Change default pixel_format across nodes from 'i420' to 'nv12' - VP9 decoder: output NV12 by interleaving libvpx's I420 U/V planes - VP9 encoder: accept NV12 via VPX_IMG_FMT_NV12 (zero-conversion path) - Compositor: add nv12_to_rgba8_buf conversion with SSE2 SIMD reuse - Colorbars: add NV12 generation and time-stamp support - Update test utilities for NV12 chroma initialization NV12's interleaved UV plane is more cache-friendly for RGBA conversion kernels, and the encoder can consume NV12 directly without format conversion, making the single-layer passthrough path faster. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: validate chroma stride before cast, update decoder description to NV12 Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf: use thread-local scratch buffers in nv12_to_rgba8_buf SIMD path Replace per-row Vec allocations with thread_local! RefCell<Vec<u8>> scratch buffers that are allocated once per thread and reused across rows. Eliminates ~2×height heap allocations per frame (e.g. 2160 allocs/frame at 1080p) while preserving correctness under both sequential and rayon parallel execution. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(nodes): eliminate NV12↔RGBA8 conversion overhead in compositor pipeline (#65) * perf(nodes): eliminate NV12↔RGBA8 conversion overhead in compositor pipeline Two targeted fixes for the hot paths identified in CPU profiling: 1. nv12_to_rgba8_buf: Replace thread-local scratch buffer deinterleaving with a dedicated nv12_to_rgba8_row_sse2 kernel that reads NV12's interleaved UV plane directly. Eliminates per-row RefCell borrow_mut and LocalKey::try_with overhead (~50% of profiled CPU time). 2. VP9 encoder: Convert RGBA8→NV12 instead of RGBA8→I420 so the encoder can feed VPX_IMG_FMT_NV12 to libvpx directly, matching the pipeline's native NV12 format and avoiding the I420 detour (~28% of profiled CPU). Adds rgba8_to_nv12_buf() for the new output path. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(nodes): add SSE4.1 fast-path kernels for color-space conversion Replace 7-instruction mul32_sse2 emulation with single-instruction _mm_mullo_epi32 in three hot kernels identified by pprof (mul32_sse2 was 26.49% CPU): - i420_to_rgba8_row_sse41: 6 native multiplies per pixel - nv12_to_rgba8_row_sse41: 6 native multiplies per pixel - rgba8_to_y_row_sse41: 3 native multiplies per pixel All _buf callers now runtime-detect SSE4.1 and prefer it, falling back to SSE2 on older hardware. Identical color-space math; no functional change. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * docs: update VP9 encoder registration to mention NV12 input format Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> Co-authored-by: staging-devin-ai-integration[bot] <166158716+staging-devin-ai-integration[bot]@users.noreply.github.com> * perf: enable thin LTO, codegen-units=1, and target-cpu=native for profiling (#66) - Add lto = "thin" and codegen-units = 1 to [profile.release] in Cargo.toml for cross-crate inlining and maximum LLVM optimisation. - Add -C target-cpu=native to build-skit-profiling and skit-profiling so CPU profiles reflect host-tuned codegen. - Add new build-skit-native target for max-perf local builds tuned to the build host's microarchitecture. - Docker/CI release builds remain portable (no target-cpu=native in Cargo.toml or .cargo/config.toml). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * perf(compositor): implement findings 1+4, 2, 5, and 3 for video compositor optimizations (#67) * perf(compositor): implement findings 1+4, 2, 5, and 3 for video compositor optimizations - Finding 1+4: Incremental stepper + interior AA skip in scale_blit_rgba_rotated Replace per-pixel multiplies with adds by stepping local_x/local_y incrementally. When min_dist >= 2.0, batch interior pixels skipping coverage math entirely. - Finding 2: NV12 interleaved-output SIMD chroma kernel (SSE2) New rgba8_to_chroma_row_nv12_sse2 with interleaved U/V store via _mm_unpacklo_epi8. Wired into rgba8_to_nv12_buf conversion path. - Finding 5: Rayon row chunking (8-row blocks) Replace per-row rayon tasks with 8-row chunks across all dispatch sites (rotated blit, i420/nv12 conversions) to reduce scheduling overhead. - Finding 3: AVX2 Y-plane kernel (8 pixels/iter) New rgba8_to_y_row_avx2 using 256-bit registers, wired with AVX2 > SSE4.1 > SSE2 priority in both I420 and NV12 Y-plane conversion paths. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): use copy_nonoverlapping instead of _mm_storeu_si128 in NV12 chroma kernel _mm_storeu_si128 writes 16 bytes but only 8 are valid (4 UV pairs), causing out-of-bounds writes on the last chroma row. Use copy_nonoverlapping with explicit 8-byte length, matching the I420 chroma kernel's store pattern. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): bound dst_region slice and add rationale comments for cast suppressions - Bound dst_region to bb_rows * row_stride to avoid dispatching rayon tasks beyond the bounding box rows. - Add explanatory comments for #[allow(clippy::cast_possible_wrap)] per AGENTS.md linting discipline requirements. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): early-out when bounding box is empty (off-screen rect) When a rotated layer is entirely off-screen, bb_y1 < bb_y0 or bb_x1 < bb_x0. The subtraction (bb_y1 - bb_y0) as usize would wrap to a huge value, causing a panic on the bounded dst_region slice. Add an early return guard before the subtraction. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * style(compositor): fix clippy and rustfmt lint issues in SIMD kernels - Remove empty line between doc comment blocks for rayon_chunk_rows - Replace manual div_ceil with .div_ceil() method - Apply rustfmt formatting to AVX2 import blocks and comments Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): cache available_parallelism in LazyLock for rayon_chunk_rows available_parallelism() issues a sysconf(_SC_NPROCESSORS_ONLN) syscall on every call (~40µs on Linux). Cache the result in a static LazyLock so subsequent calls are a simple atomic load (~0.7ns). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style(compositor): apply rustfmt to LazyLock closure Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): correct AVX2 lane-crossing in chroma kernels _mm256_packs_epi32 operates per 128-bit lane, so packing two different source registers (r_v_a, r_v_b) scrambles the element order — qwords 1 and 2 are swapped. This caused chroma samples to be spatially displaced, producing visible horizontal tearing artifacts on composited overlays. Fix: apply _mm256_permute4x64_epi64(result, 0xD8) (vpermq) immediately after each cross-source pack to restore sequential element ordering. Both rgba8_to_chroma_row_nv12_avx2 and rgba8_to_chroma_row_avx2 are fixed (3 permutes each — one per R, G, B channel). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(colorbars): default output pixel format to RGBA8 RGBA8 is more convenient and efficient for compositing workflows since the compositor operates in RGBA8 internally — no format conversion needed. Pipelines that feed colorbars directly into VP9 (without a compositor) now specify pixel_format: nv12 explicitly to avoid an unnecessary RGBA8→NV12 conversion inside the encoder. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): add AVX2 NV12→RGBA8 kernel and hoist CPU feature detection - Implement nv12_to_rgba8_row_avx2: processes 8 pixels per iteration (double SSE4.1 throughput) using 256-bit i32 arithmetic with drop-to-SSE pack/interleave to avoid lane-crossing issues - Wire AVX2 kernel into nv12_to_rgba8_buf with SSE4.1 tail handling - Hoist is_x86_feature_detected!() calls outside per-row closures in all 4 conversion functions (i420_to_rgba8_buf, nv12_to_rgba8_buf, rgba8_to_i420_buf, rgba8_to_nv12_buf) to detect once at function start and capture in variables Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): algorithmic optimizations, SSE2 blend + microbenchmark (#68) * perf(compositor): add compositor-only microbenchmark Adds a standalone benchmark that measures composite_frame() in isolation (no VP9 encode, no mux, no async runtime overhead). Scenarios: - 1/2/4 layers RGBA - Mixed I420+RGBA and NV12+RGBA (measures conversion overhead) - Rotation (measures rotated blit path) - Static layers (same Arc each frame, for future cache-hit measurement) Runs at 640x480, 1280x720, 1920x1080 by default. Baseline results on this VM (8 logical CPUs): 1920x1080 1-layer-rgba: ~728 fps (1.37 ms/frame) 1920x1080 2-layer-rgba-pip: ~601 fps (1.66 ms/frame) 1920x1080 2-layer-i420+rgba: ~427 fps (2.34 ms/frame) 1920x1080 2-layer-nv12+rgba: ~478 fps (2.09 ms/frame) 1920x1080 2-layer-rgba-rotated: ~470 fps (2.13 ms/frame) Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply rustfmt to compositor_only benchmark Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): cache YUV→RGBA conversions + skip canvas clear Optimization 1: Add ConversionCache that tracks Arc pointer identity per layer slot. When the source Arc<PooledVideoData> hasn't changed between frames, the cached RGBA data is reused (zero conversion cost). Replaces the old i420_scratch buffer approach. Optimization 2: Skip buf.fill(0) canvas clear when the first visible layer is opaque, unrotated, and fully covers the canvas dimensions. Saves one full-canvas memset per frame in the common case. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): precompute x-map to eliminate per-pixel division Optimization 3: Replace per-pixel `(dx + src_col_skip) * sw / rw` integer division in blit_row_opaque/blit_row_alpha with a single precomputed lookup table (x_map) built once per scale_blit_rgba call. Each destination column now does a table lookup instead of a division, removing O(width * height) divisions per layer per frame. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): add identity-scale fast path for 1:1 opaque blits Optimization 4: When source dimensions match the destination rect, opacity is 1.0, and there's no clipping offset, bypass the x-map lookup entirely. For fully-opaque source rows, use bulk memcpy (copy_from_slice). For rows with semi-transparent pixels, use a simplified per-pixel blend without the scaling indirection. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): pre-scale image overlays at decode time Optimization 5: When a decoded image overlay's native dimensions differ from its target rect, pre-scale it once using nearest-neighbor at config/update time. This ensures the per-frame blit_overlay call hits the identity-scale fast path (memcpy) instead of re-scaling every frame. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): cache layer configs and skip per-frame sort Optimization 6: Extract per-slot layer config resolution and z-order sorting into a rebuild_layer_cache() function that runs only when config or pin set changes (UpdateParams, pin add/remove, channel close). Per-frame layer building now uses the cached resolved configs and pre-sorted draw order instead of doing HashMap lookups and sort_by on every frame. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(frame_pool): preallocate video pool buckets at startup Optimization 7: Change video_default() from with_buckets (lazy, no preallocation) to preallocated_with_max with 2 buffers per bucket. This avoids cold-start allocation misses for the first few frames, matching the existing audio_default() pattern. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style(compositor): fix clippy warnings from optimization changes - Use map_or instead of match/if-let-else in ConversionCache and first_layer_covers_canvas - Allow expect_used with safety comment in get_or_convert - Allow dead_code on LayerSnapshot::z_index (sorting moved upstream) - Allow needless_range_loop in blit_row_opaque/blit_row_alpha (dx used for both x_map index and dst offset) - Allow cast_possible_truncation on idx as i32 in rebuild_layer_cache Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor): address correctness + bench issues from review - Fix #1 (High): skip-clear now validates source pixel alpha (all pixels must have alpha==255) before skipping canvas clear. Prevents blending against stale pooled buffer data when RGBA source has transparency. - Fix #2 (Medium): conversion cache slot indices now use position in the full layers slice (with None holes) via two-pass resolution, so cache keys stay stable when slots gain/lose frames. - Fix #3 (Medium): benchmark now calls real composite_frame() kernel instead of reimplementing compositing inline. Exercises all kernel optimizations (cache, clear-skip, identity fast-path, x-map). - Fix Devin Review: revert video pool preallocation (was allocating ~121MB across all bucket sizes at startup). Restored lazy allocation. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: apply rustfmt to fix formatting Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * perf(compositor): SSE2 blend, alpha-scan cache, bench pool, lazy prealloc Fix 4 remaining performance findings: 1. High: Add SSE2 SIMD fast path for RGBA blend loops (blit_row_opaque, blit_row_alpha). Processes 4 pixels at a time with fast-paths for fully-opaque (direct copy) and fully-transparent (skip) source pixels. 2. Medium: Optimize alpha scan in clear-skip check — skip scan entirely for I420/NV12 layers (always alpha=255 after conversion), cache scan result by Arc pointer identity for RGBA layers. 3. Medium: Pass VideoFramePool to bench_composite instead of None, so benchmark exercises pool reuse like production. 4. Low-Medium: Lazy preallocate on first bucket use — when a bucket is first hit, allocate one extra buffer so the second get() is a hit. Also: inline clear-skip logic to fix borrow checker conflict, remove unused first_layer_covers_canvas function, add clippy suppression rationale comments for needless_range_loop. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * feat(compositor-ui): UX improvements for video compositor (#69) * feat(compositor-ui): UX improvements for video compositor - Fix preview panel drag bug (inverted Y-axis) - Fix text/image overlay dragging (extend drag to all layer types) - Add visibility toggle (eye icon) to all layer types - Unified layer list showing all layers sorted by z-index - Visibility-aware canvas rendering (hidden layers show faintly) - Conditional preview panel (only shows when there's something to preview) - Fullscreen toggle for preview panel - Preview activation button in Monitor view top bar (watch-only MoQ) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): address 5 UX issues from testing feedback 1. Fix rotation stretching: add transform-origin: center center to LayerBox 2. In-place text editing: double-click text overlay to edit inline on canvas - Disable resize handles for text layers (size controlled by font-size) 3. Fix overlay removal caching: add timestamp guard to prevent stale params from overwriting local overlay changes during sync 4. Consolidate overlays into unified layers: merge overlay add/remove/edit controls into UnifiedLayerList, remove separate OverlayList from render 5. Resizable preview panel: add left/top edge drag handles to resize panel Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): remove text layer padding and use indexed labels Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): address review bot findings (escape cancel, visibility sync, memo deps) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): guard double-commit on Enter and preserve overlay visibility on re-sync Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): preserve video layer opacity on visibility re-sync Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): clear selection on overlay removal to prevent stale selectedLayerId Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): use committedRef to prevent double-fire on Enter+blur in text edit (#71) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * fix(video): preserve aspect ratio in compositor rotation and stream rendering (#70) * feat(nodes): preserve aspect ratio in rotated compositor layers Replace the stretch-to-fill mapping in scale_blit_rgba_rotated with a uniform-scale fit (object-fit: contain). When a rotated layer's source aspect ratio differs from the destination rect the image is now centred with transparent padding instead of being distorted. - Compute fit_scale = min(rw/sw, rh/sh) for uniform scaling - Use content-local half-widths (half_cw, half_ch) for the bounding box and edge anti-aliasing distances - Map content coords → source pixels via inv_fit_scale instead of normalising through the full rect dimensions - Add test_rotated_blit_preserves_aspect_ratio unit test - Update sample pipeline comment to document the behaviour Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(nodes): account for rotation angle in compositor fit scale The previous fit scale only considered the source-to-rect aspect ratio mismatch, which had no effect when both shared the same ratio (e.g. 4:3 source in a 4:3 rect). The real issue is that a rotated rectangle's axis-aligned bounding box is larger than the original, so the content must be scaled down to fit within the rect after rotation. New formula: rotated_bb_w = src_w·|cos θ| + src_h·|sin θ| rotated_bb_h = src_w·|sin θ| + src_h·|cos θ| fit_scale = min(rect_w / rotated_bb_w, rect_h / rotated_bb_h) This ensures the rotated content fits entirely within the destination rect with transparent padding, producing a natural-looking rotation regardless of aspect ratio match. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): derive canvas aspect ratio from stream dimensions Replace hardcoded aspectRatio CSS values ('4 / 3' in StreamView, '16 / 9' in OutputPreviewPanel) with a dynamic value observed from the canvas element's width/height attributes. The new useCanvasAspectRatio hook uses a MutationObserver to track attribute changes made by the Hang video renderer, ensuring the displayed aspect ratio always matches the actual video stream. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): use auto width on stream canvas to prevent stretching When the container is wider than what the aspect ratio allows at maxHeight 480px, width: 100% caused the canvas to stretch horizontally. Changed to width: auto + max-width: 100% so the browser computes the width from the aspect ratio and height constraint, then centers the canvas with margin: 0 auto. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(ui): skip default canvas dimensions in aspect ratio hook Check canvas.getAttribute('width'/'height') before reading the .width/.height properties. A newly-created canvas has default intrinsic dimensions of 300x150 which would be reported as a valid 2:1 ratio, causing a layout shift before the first video frame arrives. Now the hook returns undefined until the Hang renderer explicitly sets the canvas attributes. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(nodes): unify 0° fast path to use aspect-ratio-preserving fit The near-zero rotation fast path now computes a fitted sub-rect (uniform scale + centering) before delegating to scale_blit_rgba, matching the rotated path's aspect-ratio-preserving behaviour. This eliminates the behavioural discontinuity where 0° rotation would stretch-to-fill while any non-zero rotation would letterbox. Animating rotation through 0° no longer causes a visual pop. Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): address 7 UX issues in compositor node (#72) * fix(compositor-ui): address 7 UX issues in compositor node Issue #1: Click outside text layer commits inline edit - Add document.activeElement.blur() in handlePaneClick before deselecting - Add useEffect on TextOverlayLayer watching isSelected to commit on deselect Issue #2: Preview panel resizable from all four edges - Add ResizeEdgeRight and ResizeEdgeBottom styled components - Extend handleResizeStart edge type to support right/bottom - Update resizeRef type to match Issue #3: Monitor view preview extracts MoQ peer settings from pipeline - Find transport::moq::peer node in pipeline and extract gateway_path/output_broadcast - Set correct serverUrl and outputBroadcast before connecting - Import updateUrlPath utility Issue #4: Deep-compare layer state to prevent position jumps on selection change - Skip setLayers/setTextOverlays/setImageOverlays when merged state is structurally equal - Prevents stale server-echoed values from causing visual glitches Issue #5: Rotate mouse delta for rotated layer resize handles - Transform (dx, dy) by -rotationDegrees in computeUpdatedLayer - Makes resize handles behave naturally regardless of layer rotation Issue #6: Visual separator between layer list and per-layer controls - Add borderTop and paddingTop to LayerInfoRow for both video and text controls Issue #7: Text layers support opacity and rotation sliders - Add rotationDegrees field to TextOverlayState, parse/serialize rotation_degrees - Add rotation transform to TextOverlayLayer canvas rendering - Replace numeric opacity input with slider matching video layer controls - Add rotation slider for text layers Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(compositor-ui): fix preview drag, text state flicker, overlay throttling, multiline text - OutputPreviewPanel: make panel body draggable (not just header) with cursor: grab styling so preview behaves like other canvas nodes - useCompositorLayers: add throttledOverlayCommit for text/image overlay updates (sliders, etc.) to prevent flooding the server on every tick; increase overlay commit guard from 1.5s to 3s to prevent stale params from overwriting local state; arm guard immediately in updateTextOverlay and updateImageOverlay - CompositorCanvas: change InlineTextInput from <input> to <textarea> for multiline text editing; Enter inserts newline, Ctrl/Cmd+Enter commits; add white-space: pre-wrap and word-break to text content rendering; add ResizeHandles to TextOverlayLayer when selected - CompositorNode: change OverlayTextInput to <textarea> with vertical resize support for multiline text in node controls panel Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com> * feat(compositor): consolidate overlay transforms + unified z-sorted blit loop Backend consolidation: - Add OverlayTransform struct with #[serde(flatten)] for wire-compatible common spatial/visual properties (rect, opacity, rotation_degrees, z_index) - Add rotation_degrees and z_index fields to DecodedOverlay - Replace three separate blit loops (video, image, text) with a single z-sorted BlitItem loop, enabling interleaved layer ordering - Remove dead blit_overlay() function (replaced by unified path) - Add SSE2 batched blending for rotated blit interi…

Critical fixes: - Exclusive routing: dynamic channel OR static output, never both (fix #1) - RwLock poison logged as error instead of silently swallowed (fix #2) Improvements: - Spawned input-forwarding task uses tokio::select! with shutdown_rx (fix #3) - validate_connection_types logs at warn for dynamic pin skip (fix #4) - Document poll_fn starvation bias as accepted trade-off (fix #5) - Remove unused channels parameter from handle_pin_management (fix #6) Nits: - Update DynamicOutputs doc comment, remove stale legacy reference (fix #7) - Use Arc short form (already imported) (fix #8) - Improve test to exercise MoqPeerNode::new + output_pins + make_dynamic_output_pin (fix #9) Also refactored handle_pin_management and process_frame_from_group to reduce cognitive complexity below the 50-point lint threshold by extracting route_packet, spawn_dynamic_input_forwarder, insert_dynamic_output, remove_dynamic_output, and make_dynamic_input_pin helper methods. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* feat(transport): add dynamic pin support to moq_peer and moq_push Generalize MoQ transport nodes to discover and create tracks/pins dynamically from catalogs instead of hardcoding audio+video pairs. moq_peer changes: - Set supports_dynamic_pins() to true - Thread DynamicOutputs (Arc<RwLock<HashMap>>) through the publisher call chain: run -> start_publisher_task_with_permit -> publisher_receive_loop -> watch_catalog_and_process -> spawn_track_processor -> process_publisher_frames -> process_frame_from_group - In watch_catalog_and_process, build track-named dynamic pin names (e.g. audio/data, video/hd) from catalog entries - In process_frame_from_group, send frames to both the dynamic (track-named) output pin and the legacy pin for backward compat - Handle all PinManagementMessage variants in handle_pin_management - Accept both EncodedAudio(Opus) and EncodedVideo(VP9) on both input pins (in/in_1) for flexible media routing moq_push changes: - Set supports_dynamic_pins() to true - Accept both EncodedAudio(Opus) and EncodedVideo(VP9) on both input pins (in/in_1) - Handle dynamic input pin creation via PinManagementMessage, mapping each new pin to a corresponding MoQ track - Add pin management select branch in the run loop Engine changes (dynamic_actor.rs): - In validate_connection_types, skip strict type validation for source pins on nodes that support dynamic pins - In connect_nodes, create output pins on-demand via RequestAddOutputPin -> AddedOutputPin flow when the pin distributor doesn't exist but the node supports dynamic pins All existing pipeline YAML files continue to work unchanged. Legacy out/out_1 and in/in_1 pins remain as stable fallbacks. Refs: #197 Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): address review feedback on dynamic pin support - Poll dynamic input receivers in moq_push select loop using poll_fn - Determine is_video from pin name prefix convention instead of accepts_types - Forward dynamic input pin packets in moq_peer instead of dropping channel - Use DynamicInputState struct instead of tuple for type clarity Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * test(transport): add regression tests for dynamic pin fixes - Test make_dynamic_output_pin produces correct types for video/audio/bare names - Test AddedInputPin channel is not dropped (regression for channel discard bug) - Test is_video determination uses pin name prefix convention - Test track name derivation from pin names Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): fix double-prefixed pin names and shutdown cleanup - Use catalog track names directly (already prefixed) instead of re-prefixing with audio/ or video/, which caused double-prefixed names like 'audio/audio/data' - Finish dynamic input track producers on MoqPushNode shutdown - Add regression test for double-prefix bug Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): finish track producers on remove, add stats to dynamic input forwarding - RemoveInputPin now calls finish() on track producers before dropping - Dynamic input forwarding tasks in moq_peer report received/sent stats via stats_delta_tx, matching the static pin handler pattern Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(transport): remove legacy pin names, use track-named pins exclusively BREAKING CHANGE: moq_peer output pins renamed from out/out_1 to audio/data and video/data to match catalog track names. Removes audio_output_pin/video_output_pin parameters from the entire publisher call chain (start_publisher_task_with_permit, publisher_receive_loop, watch_catalog_and_process, spawn_track_processor, process_publisher_frames, process_frame_from_group). Unifies output_pin and dynamic_pin_name into a single track-name-based output pin. Updates all sample pipeline YAML files to reference the new pin names. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: update remaining out_1 references in samples, e2e fixtures, tests, and docs Updates missed references to the old moq_peer out/out_1 pin names: - samples/pipelines/dynamic/video_moq_webcam_pip.yml - samples/pipelines/dynamic/video_moq_screen_share.yml - e2e/fixtures/webcam-pip.yaml - e2e/fixtures/webcam-pip-cropped.yaml - e2e/fixtures/webcam-pip-circle.yaml - crates/api/src/yaml.rs (parser tests) - docs/src/content/docs/guides/creating-pipelines.md Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * refactor(transport): address all 9 review items on dynamic pin support Critical fixes: - Exclusive routing: dynamic channel OR static output, never both (fix #1) - RwLock poison logged as error instead of silently swallowed (fix #2) Improvements: - Spawned input-forwarding task uses tokio::select! with shutdown_rx (fix #3) - validate_connection_types logs at warn for dynamic pin skip (fix #4) - Document poll_fn starvation bias as accepted trade-off (fix #5) - Remove unused channels parameter from handle_pin_management (fix #6) Nits: - Update DynamicOutputs doc comment, remove stale legacy reference (fix #7) - Use Arc short form (already imported) (fix #8) - Improve test to exercise MoqPeerNode::new + output_pins + make_dynamic_output_pin (fix #9) Also refactored handle_pin_management and process_frame_from_group to reduce cognitive complexity below the 50-point lint threshold by extracting route_packet, spawn_dynamic_input_forwarder, insert_dynamic_output, remove_dynamic_output, and make_dynamic_input_pin helper methods. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): eliminate TOCTOU race in route_packet Hold a single read lock for both the existence check and the send in route_packet, preventing a concurrent RemoveOutputPin from removing the entry between two separate lock acquisitions which would silently drop the packet. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): handle closed dynamic output channels in route_packet Distinguish try_send results: Ok and Full return true (packet sent or acceptable frame drop for real-time media), Closed returns false to trigger shutdown — matching the static output path behaviour. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): keep track processor alive on closed dynamic channel A closed dynamic output channel (downstream consumer disconnected) now removes the stale entry and continues instead of triggering FrameResult::Shutdown. This prevents a single consumer disconnect from killing the entire track processor. Also extract track_name_from_pin() and is_video_pin() into named functions in push.rs so tests exercise the real production code instead of duplicating the logic inline. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): keep dynamic input forwarder alive when no subscribers Match the static input path behaviour: discard frames with `let _ = tx.send(frame)` instead of breaking out of the loop when there are no active broadcast receivers. This prevents the dynamic input forwarder from permanently shutting down between subscriber connections. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): address review round 3 — catalog republish, single-lock route_packet, cleanup - Re-publish MoQ catalog when dynamic tracks are added/removed (push.rs) - Merge route_packet double RwLock acquisition into single lock with RouteOutcome enum - Add design rationale comment on std::sync::RwLock choice for DynamicOutputs - Extract moq_accepted_media_types() helper, deduplicate across peer/mod.rs and push.rs - Change dynamic pin validation log from warn to debug (dynamic_actor.rs) - Use Arc::default() consistently for DynamicOutputs construction - Update moq_peer.yml comment to mention video/data output pin - Remove unused type imports from push.rs Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(transport): downgrade catalog republish log to debug Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: address round 4 review — packet drops, forwarder lifecycle, type validation - route_packet: match on TrySendError::Full/Closed via RouteOutcome enum, log dropped packets at debug level instead of silently discarding - Store JoinHandle for each dynamic input forwarder in a HashMap; abort on RemoveInputPin to prevent task leaks - After dynamic output pin creation in connect_nodes, validate type compatibility using can_connect_any before wiring - republish_catalog returns bool; on failure roll back catalog entry and skip adding DynamicInputState - Use swap_remove instead of remove for O(1) dynamic_inputs removal - Consistent lock-poisoning recovery via unwrap_or_else(PoisonError::into_inner) - Align default dynamic pin names (in_dyn → dynamic_in) - Extract activate_dynamic_input, insert/remove_catalog_rendition helpers to stay within cognitive_complexity limit Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: clean up stale resources on dynamic pin creation failures - Type-mismatch early return in connect_nodes now removes the orphaned PinDistributor entry and stale pin metadata before returning - AddedOutputPin send failure path gets the same cleanup - Document that validate_connection_types skips dest-pin validation too when source node supports dynamic pins (known limitation) - RemoveInputPin in push.rs uses swap_remove instead of drain+collect - Prune finished forwarder JoinHandles on AddedInputPin to prevent unbounded growth from naturally-closed channels - Add safety comment about poll_fn/select! mutable borrow interaction - Deduplicate output_pins() by reusing make_dynamic_output_pin Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: finish leaked producers, shut down orphaned distributors, cleanup nits - activate_dynamic_input: finish track producer before returning on catalog republish failure to avoid dangling broadcast track - connect_nodes: send PinConfigMsg::Shutdown to the spawned PinDistributor on both type-mismatch and AddedOutputPin send failure error paths, preventing orphaned actor tasks - Abort all forwarder JoinHandles on node shutdown for deterministic cleanup instead of relying on channel close propagation - Remove redundant 'let mut catalog_producer = catalog_producer' rebinding - Downgrade subscriber_count atomics from SeqCst to Relaxed (only used for logging, no cross-variable synchronization needed) Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: rollback leaked input pins, guard duplicates, timeout pin creation - Add rollback_dynamic_input helper to clean up destination input pins when step-2 (output pin creation) fails in connect_nodes - Track created_dynamic_input to conditionally rollback on all 6 step-2 failure paths (type mismatch, send failures, timeouts) - Wrap RequestAddInputPin and RequestAddOutputPin responses with tokio::time::timeout(5s) to prevent engine deadlock - Guard duplicate dynamic input pin names in push.rs with check-and-replace via swap_remove - Abort old forwarder handle on re-add collision in peer/mod.rs - Extract activate_dynamic_input_forwarder to reduce cognitive complexity - Bump stale dynamic output entry log from debug to info - Make original catalog binding mut, remove redundant rebind - Align moq_accepted_media_types() import qualification - Shut down orphaned PinDistributor actors on type-mismatch and AddedOutputPin send failure paths Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: assert pin.name == from_pin invariant on dynamic output creation Add debug_assert_eq! after receiving the pin definition from RequestAddOutputPin to make the implicit contract explicit: the node must return the suggested name unchanged. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

When the compositor's output_format is NV12 or I420, the GPU path eliminates the expensive CPU RGBA→YUV conversion entirely (~14% of CPU time in profiled pipelines). The should_use_gpu() heuristic now considers this, preferring GPU compositing whenever YUV output is requested — even for simple scenes that would otherwise stay on CPU. This addresses the #1 CPU hotspot identified in production profiling: rgba8_to_nv12_buf at 9.12% + parallel_rows at 5.28% = 14.4% combined. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* feat(compositor): factor output_format into GPU heuristic When the compositor's output_format is NV12 or I420, the GPU path eliminates the expensive CPU RGBA→YUV conversion entirely (~14% of CPU time in profiled pipelines). The should_use_gpu() heuristic now considers this, preferring GPU compositing whenever YUV output is requested — even for simple scenes that would otherwise stay on CPU. This addresses the #1 CPU hotspot identified in production profiling: rgba8_to_nv12_buf at 9.12% + parallel_rows at 5.28% = 14.4% combined. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * ci: cancel superseded workflow runs on same PR Adds a concurrency group keyed on PR number / branch ref with cancel-in-progress: true. This prevents the single self-hosted GPU runner from being blocked by stale jobs when new commits are pushed. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * test(compositor): fix flaky oneshot timing and runtime format tests Two tests flaked on the self-hosted GPU runner where many tests run concurrently and compete for CPU: 1. test_oneshot_processes_faster_than_realtime: reduced from 30@30fps (budget 500ms vs 1000ms real-time = 10% margin) to 10@5fps (budget 1500ms vs 2000ms real-time = 25% margin). The previous budget was nearly indistinguishable from per-frame scheduling overhead (~30ms) under CI load. 2. test_compositor_output_format_runtime_change: increased inter-step sleeps from 100/50/100ms to 300/200/300ms. The compositor thread can be starved for CPU when GPU tests run in parallel, so the original windows were not enough for even one tick to fire. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

- Split wildcard Aac | _ pattern into explicit arms with tracing::warn for unrecognised future audio codecs (Critical #1) - Parameterize DEFAULT_AUDIO_FRAME_DURATION_US by codec: Opus 20ms, AAC ~21.333ms via const fn helpers (Suggestion #2) - Compute AAC timestamps from frame count to avoid truncation drift: sequence * 1024 * 1_000_000 / 48_000 (Suggestion #3) - Document Binary vs EncodedAudio semantic mismatch in AAC encoder output pin (Suggestion #4) - Bundle video/audio codec into MediaCodecConfig struct for handle_pin_management (Suggestion #5) - Deduplicate parse_audio_codec_config: mp4.rs delegates to shared implementation in moq/constants.rs (Nit #1) - Document 960→1024 mixer/encoder frame size interaction and rewrite moq_aac_mixing.yml as documented placeholder (Nit #2) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* feat: add AAC encoder native plugin with MP4/MoQ support Implement an AAC-LC encoder as a native plugin using shiguredo_fdk_aac 2025.1.1, keeping non-royalty-free codec dependencies out of the core. Plugin (plugins/native/aac-encoder/): - NativeProcessorNode impl: f32→i16 PCM conversion, 1024-sample framing, configurable bitrate (default 128 kbps), content_type and metadata preservation via BinaryWithMeta packets. Plugin SDK C ABI (v7, backward-compatible with v6): - New CPacketType::BinaryWithMeta variant and CBinaryPacket struct to preserve content_type and metadata across the native plugin boundary. - Plugin host accepts both v6 and v7 plugins. Core types: - Add AudioCodec::Aac variant. MP4 muxer: - Explicit Aac match arms in content type and sample entry builders. - New audio_codec config field for codec override. MoQ transport (push + peer): - AAC in moq_accepted_media_types(). - catalog_audio_codec() / resolve_audio_codec() / parse_audio_codec_config() helpers mirroring the video codec pattern. - audio_codec config field on MoqPushConfig and MoqPeerConfig. Build system: - just build-plugin-native-aac-encoder target. - lint-plugins / fix-plugins / build-plugins-native entries. Sample pipelines: - oneshot/aac_encode.yml (audio-only AAC in MP4) - oneshot/mp4_mux_aac_h264.yml (AAC + H264 in MP4) - dynamic/moq_aac_mixing.yml (MoQ broadcasting with mixing + gain) Signed-off-by: Devin AI <devin@cognition.ai> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: downgrade BinaryWithMeta for v6 plugins, fix audio_codec deserialization Address two bugs found by Devin Review: 1. BinaryWithMeta (discriminant 10) was sent to v6 plugins that only understand discriminants 0-9, causing packet drops. Fix: store the plugin API version in InstanceState and call downgrade_binary_with_meta() before forwarding to v6 plugins. This converts to plain Binary, preserving the raw bytes while dropping the content_type/metadata that v6 cannot interpret. 2. Mp4MuxerConfig.audio_codec was typed as Option<AudioCodec>, but the AudioCodec enum has no serde rename_all attribute, so YAML values like 'aac' (lowercase) failed deserialization. Fix: change the field to Option<String> with a case-insensitive parse helper, consistent with MoqPeerConfig/MoqPushConfig. Includes regression tests for the downgrade logic. Signed-off-by: Devin AI <devin@cognition.ai> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: accept mono input in AAC encoder, upmix to stereo The Opus decoder outputs mono (1 channel) but the AAC encoder previously only accepted stereo (2 channels), causing an incompatible connection error in the graph builder. Fix: accept both mono and stereo input on the 'in' pin. Mono samples are duplicated to both L/R channels before encoding, since the FDK AAC library (shiguredo_fdk_aac) hardcodes stereo output. Signed-off-by: Devin AI <devin@cognition.ai> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: address code review feedback for AAC encoder PR - Split wildcard Aac | _ pattern into explicit arms with tracing::warn for unrecognised future audio codecs (Critical #1) - Parameterize DEFAULT_AUDIO_FRAME_DURATION_US by codec: Opus 20ms, AAC ~21.333ms via const fn helpers (Suggestion #2) - Compute AAC timestamps from frame count to avoid truncation drift: sequence * 1024 * 1_000_000 / 48_000 (Suggestion #3) - Document Binary vs EncodedAudio semantic mismatch in AAC encoder output pin (Suggestion #4) - Bundle video/audio codec into MediaCodecConfig struct for handle_pin_management (Suggestion #5) - Deduplicate parse_audio_codec_config: mp4.rs delegates to shared implementation in moq/constants.rs (Nit #1) - Document 960→1024 mixer/encoder frame size interaction and rewrite moq_aac_mixing.yml as documented placeholder (Nit #2) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: inline parse_audio_codec_config in mp4.rs to avoid moq feature dependency The mp4 feature does not depend on moq, so importing from transport::moq::constants would break --features mp4 builds. Inline the trivial parsing logic directly in mp4.rs instead. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: use map_or for parse_mp4_audio_codec_config (clippy) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: address second round of code review feedback - Add TODO comments for MoqPullNode Opus hardcoding (blocked by Binary→EncodedAudio C ABI gap) - Add video_codec config field to MP4 muxer for accurate pre-connection MIME hint (mirrors existing audio_codec field) - Add make_dynamic_output_pin AAC test verifying AudioCodec::Aac is threaded through to audio output pins - Rewrite moq_aac_mixing.yml as runnable pipeline with MP4 muxer sink instead of broken placeholder - Set video_codec: h264 in mp4_mux_aac_h264.yml for correct hint Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: accept Binary in MP4 muxer input pins + fix mixed-source oneshot pipelines - Add PacketType::Binary to MP4 muxer accepted input types so native plugins (which output Binary via C ABI) can connect directly. - Fix oneshot engine to detect generator root nodes (e.g. colorbars) even when http_input nodes are present, enabling mixed-source pipelines like AAC+H264 MP4 mux. - Fix mp4_mux_aac_h264.yml: use explicit pin mapping (in/in_1) and add num_inputs: 2 for dual-stream muxing. - Fix clippy single_option_map lint on parse_mp4_video_codec_config. Validated end-to-end: - aac_encode.yml: AAC-LC 48kHz stereo 128kbps in MP4 container - mp4_mux_aac_h264.yml: H.264 640x480 + AAC-LC stereo in MP4 Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: use fragmented MP4 for browser MSE playback + drift-free duration_us - Change aac_encode.yml from mode: file to mode: stream so the MP4 output contains mvex/moof atoms required by Media Source Extensions. - Compute duration_us from frame count (next_timestamp - this_timestamp) instead of using the truncated AAC_FRAME_DURATION_US constant, making duration consistent with the drift-free timestamp computation. - Remove unused AAC_FRAME_DURATION_US constant. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: MSE playback issues for AAC+H264 pipeline + MoQ YAML syntax - Fix MSE codec string mismatch: OpenH264 at 640x480 outputs Level 3.0 (avc1.42c01e), not Level 3.1 (avc1.42c01f). MSE is strict about this match and rejects the init segment when codecs don't match. - Fix moq_aac_mixing.yml: use dot syntax (moq_peer.audio/data) instead of bracket syntax (moq_peer[audio/data]) for dynamic pin references. - Improve classify_packet docstring to document all handled packet types. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: MSE playback + MoQ AAC pipeline issues - Initialise video_codec from config (matching audio_codec pattern) so the muxer uses H264 even when type resolution is unavailable. Previously video_codec was hardcoded to Av1, causing the init segment to contain an av01 track instead of avc1. - Fix placeholder AVC1 sample entry profile_compatibility (0 → 0xC0) to match the SPS constraint flags in the placeholder NAL unit. - Fix moq_aac_mixing.yml: replace unsupported bracket syntax (mic_gain[in_0]) with simple array syntax so Needs::Multiple auto-generates in_0/in_1 by index. - Add codec detection tracing for easier debugging. - Add regression tests for all three fixes. Signed-off-by: Devin AI <devin@streamkit.dev> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(mp4): defer first fMP4 flush when inputs still open in skip-classification mode In skip-classification mode (dual-input with explicit dimensions), the safety cap on FMP4_FIRST_FLUSH_DEFER_CAP could force-flush an init segment before all expected tracks had produced data. When the audio path processes data much faster than the video path (e.g. file-based audio vs. a video generator that needs font initialization), the cap would trigger an audio-only init segment missing the expected h264 track, causing Chrome MSE to reject it with: 'Initialization segment misses expected h264 track' The fix checks whether input channels are still open before applying the safety cap. As long as channels remain open, a slow-starting track may still produce data, so the flush is deferred. Once all channels close, the cap fires normally to handle genuinely misconfigured pipelines. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(plugin-sdk): add EncodedAudio discriminant to native plugin C ABI Add CPacketType::EncodedAudio (= 11) to the native plugin C ABI, allowing plugins to declare EncodedAudio output types (e.g. AAC) that are compatible with MoQ transport nodes. The codec name is carried in the existing custom_type_id pointer field (e.g. "aac", "opus") to preserve CPacketTypeInfo struct layout and maintain ABI compatibility with v6/v7 plugins. Also: - Bump NATIVE_PLUGIN_API_VERSION to 8 - Update AAC encoder plugin to declare EncodedAudio(Aac) output - Add CAudioCodec enum for documentation/future use - Add secondary hard cap (FMP4_SKIP_CLASS_HARD_CAP = 30000) for skip-classification fMP4 flush deferral to prevent unbounded memory growth from pathological misconfiguration - Create moq_aac_echo.yml sample pipeline for AAC echo over MoQ - Remove outdated MoQ AAC limitation comment from moq_aac_mixing.yml Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: rustfmt formatting for EncodedAudio conversions Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * feat(moq-peer): add subscriber_audio_codec config for transcoding pipelines Add a new subscriber_audio_codec parameter to MoqPeerConfig that controls the subscriber-side MoQ catalog codec independently from the publisher output pin type (audio_codec). This enables transcoding pipelines where the publisher sends one codec (e.g. Opus) but the pipeline re-encodes to another (e.g. AAC) before feeding it back to subscribers. Without this separation, audio_codec controlled both the output pin type AND the catalog codec, causing type mismatches in the graph builder. Also fixes moq_aac_mixing.yml: - Replace non-existent path/audio_only fields with correct gateway_path/input_broadcasts/output_broadcast/allow_reconnect - Remove dead-end mp4_muxer node; feed AAC directly back to moq_peer for MoQ streaming - Add client section for browser WebTransport connection Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: rustfmt formatting for subscriber_audio_codec Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(moq-peer): use publisher codec for dynamic output pins Dynamic output pins carry data FROM the publisher, so they must use the publisher's audio_codec — not subscriber_audio_codec. Without this fix, non-primary broadcast output pins (created at runtime via handle_pin_management) would be incorrectly typed with the subscriber codec in transcoding pipelines. Also fixes misleading FMP4_SKIP_CLASS_HARD_CAP comment: 30,000 samples ≈ 10 minutes of audio at typical AAC rates, not seconds. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(plugin-sdk): use EncodedAudio discriminant in macro metadata generation The native_plugin_entry! and native_source_plugin_entry! macros were using CPacketType::Binary as the fallback for non-Opus EncodedAudio variants (e.g. AAC). This caused the host to read the plugin's output pin type as Binary instead of EncodedAudio(Aac), even when the plugin source correctly declares EncodedAudio(Aac). Fix all four occurrences (processor + source macros × input + output pins): - type_discriminant: Binary → EncodedAudio - custom_type_id: also populate codec name for EncodedAudio (was only set for Custom types), so the host can round-trip the codec through the C ABI Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style: rustfmt formatting for plugin-sdk macro Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(moq-peer): advertise stereo channel_count for AAC in subscriber catalog The AAC-LC encoder always outputs stereo (upmixing mono input), but the subscriber catalog hardcoded channel_count=1. The client's AudioRingBuffer was initialized with 1 channel from the catalog, then received 2-channel decoded AAC frames, causing 'wrong number of channels' errors. Derive channel_count from the subscriber audio codec: AAC→2, Opus→1. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix: address review findings — stale comments, dead code, cap, plugin.yml 1. Update stale API version comments in plugin-native (lib.rs, wrapper.rs) to reflect v6/v7/v8 compatibility and document that EncodedAudio is metadata-only (no runtime packet downgrade needed). 2. Remove unused CAudioCodec enum from types.rs — codec name is carried as a string via custom_type_id, not via this enum. 3. Lower FMP4_SKIP_CLASS_HARD_CAP from 100× (30,000 ≈ 10 min) to 10× (3,000 ≈ 1 min) for more reasonable memory bounds. 4. Fix plugin.yml: 'stereo' → 'mono or stereo' to match actual plugin behavior (mono input is upmixed to stereo). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: Devin AI <devin@cognition.ai> Signed-off-by: StreamKit Devin <devin@streamkit.dev> Signed-off-by: Devin AI <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

Add a new native plugin for fast English speech recognition using NVIDIA's Parakeet TDT (Token-and-Duration Transducer) 0.6B model via sherpa-onnx. Parakeet TDT is approximately 10x faster than Whisper on consumer hardware with competitive accuracy (#1 on HuggingFace ASR leaderboard). Plugin implementation: - Offline transducer recognizer (encoder/decoder/joiner) via sherpa-onnx C API - Silero VAD v6 for streaming speech segmentation - Recognizer caching keyed on (model_dir, num_threads, execution_provider) - Configurable VAD threshold, silence duration, and max segment length - 16kHz mono f32 audio input, transcription output Justfile additions: - build-plugin-native-parakeet: build the plugin - download-parakeet-models: download INT8 quantized model (~660MB) - setup-parakeet: full setup (sherpa-onnx + models + VAD) - Added parakeet to copy-plugins-native loop Includes sample oneshot pipeline (parakeet-stt.yml) and plugin.yml manifest. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* feat(plugins): add Parakeet TDT speech-to-text plugin Add a new native plugin for fast English speech recognition using NVIDIA's Parakeet TDT (Token-and-Duration Transducer) 0.6B model via sherpa-onnx. Parakeet TDT is approximately 10x faster than Whisper on consumer hardware with competitive accuracy (#1 on HuggingFace ASR leaderboard). Plugin implementation: - Offline transducer recognizer (encoder/decoder/joiner) via sherpa-onnx C API - Silero VAD v6 for streaming speech segmentation - Recognizer caching keyed on (model_dir, num_threads, execution_provider) - Configurable VAD threshold, silence duration, and max segment length - 16kHz mono f32 audio input, transcription output Justfile additions: - build-plugin-native-parakeet: build the plugin - download-parakeet-models: download INT8 quantized model (~660MB) - setup-parakeet: full setup (sherpa-onnx + models + VAD) - Added parakeet to copy-plugins-native loop Includes sample oneshot pipeline (parakeet-stt.yml) and plugin.yml manifest. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(plugins): address review feedback for parakeet plugin - Add build-plugin-native-parakeet to build-plugins-native target - Fix plugin.yml repo_id to match actual HuggingFace source repos (csukuangfj/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 for model, streamkit/sensevoice-models for silero-vad) - Regenerate marketplace/official-plugins.json with parakeet entry - Add download-parakeet-models as optional in download-models output (skipped by default due to ~660MB size, similar to pocket-tts) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * docs(plugins): add parakeet docs page, fix model checksums and download - Add plugin docs page (plugin-native-parakeet.md) with parameters, example pipeline, and JSON schema - Update plugin index to include parakeet (10 → 11 official plugins) - Fix model download: individual files from HuggingFace instead of non-existent tar.bz2 archive - Add per-file sha256 checksums via file_checksums field (matching ModelSpec struct) for integrity verification - Fix expected_size_bytes to actual total (661190513) - Regenerate marketplace/official-plugins.json Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(engine): skip synthetic nodes in oneshot content-type backward walk The content-type backward walk in run_oneshot_pipeline walks backwards through the pipeline graph to find a node that declares a content_type. When no node in the chain returns a content_type (e.g. STT pipelines ending in json_serialize), the walk reaches streamkit::http_input which is a synthetic node not in the registry, causing a 500 error. Skip synthetic oneshot nodes (http_input/http_output) in the backward walk since they are handled separately by the engine and are not registered in the node registry. This fixes all STT-style oneshot pipelines (parakeet-stt, sensevoice-stt, speech_to_text, etc.) that use json_serialize → http_output. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * style(engine): format oneshot backward walk Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * docs(plugins): add README for parakeet plugin Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(plugins): update parakeet model repo_id to streamkit/parakeet-models Update the parakeet plugin.yml to point to the controlled streamkit/parakeet-models HuggingFace repo instead of the external csukuangfj repo. Regenerate marketplace metadata accordingly. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(plugins): update parakeet model download URL to streamkit HF space Point the justfile download target and README references to streamkit/parakeet-models instead of the external csukuangfj repo. Original export attribution preserved in README. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> * fix(plugins): address parakeet review feedback - Point silero-vad repo_id to streamkit/parakeet-models instead of streamkit/sensevoice-models to avoid cross-plugin dependency - Remove unused cc build-dependency - Remove unused once_cell dependency (code uses std::sync::LazyLock) - Fix misleading update_params comment that claimed VAD params could be updated at runtime - Remove const from set_threshold (f32::clamp is not const-stable) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com> --------- Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>

fix(tools): install script

c2e9ba5

streamer45 self-assigned this Dec 27, 2025

streamer45 merged commit b30f83f into main Dec 27, 2025
13 checks passed

streamer45 deleted the fix-install-script branch December 27, 2025 15:41

staging-devin-ai-integration bot mentioned this pull request Mar 1, 2026

perf(compositor): algorithmic optimizations, SSE2 blend + microbenchmark #68

Merged

4 tasks

staging-devin-ai-integration bot mentioned this pull request Mar 1, 2026

fix(compositor-ui): address 7 UX issues in compositor node #72

Merged

6 tasks

staging-devin-ai-integration bot mentioned this pull request Mar 2, 2026

fix(compositor): improve image overlay quality, caching, aspect ratio, and selectability #78

Merged

8 tasks

staging-devin-ai-integration bot mentioned this pull request Mar 3, 2026

feat(compositor): text color, font selection, draggable layers, clipping fix #80

Merged

5 tasks

staging-devin-ai-integration bot mentioned this pull request Mar 13, 2026

fix: implement review findings for VP9, colorbars & WebM #138

Merged

5 tasks

staging-devin-ai-integration bot mentioned this pull request Mar 29, 2026

feat(compositor): factor output_format into GPU heuristic #218

Merged

3 tasks

staging-devin-ai-integration bot mentioned this pull request Apr 5, 2026

feat(plugin-sdk): add frame pool allocation for native plugins #249

Merged

5 tasks

staging-devin-ai-integration bot mentioned this pull request Apr 10, 2026

feat(plugins): add Parakeet TDT speech-to-text plugin #281

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tools): install script#1

fix(tools): install script#1
streamer45 merged 1 commit intomainfrom
fix-install-script

streamer45 commented Dec 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

streamer45 commented Dec 27, 2025

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant