docs: update README, ROADMAP, and docs for AV1, compositor, and Slint (#250)

staging-devin-ai-integration[bot] · streamkit-devin · streamer45 · web-flow · commit 79c243510499 · 2026-04-05T08:53:08.000+02:00
Reflect shipped features across documentation:

- README: update use cases, media focus, and what's-included to mention
  AV1 codecs, GPU compositing, and Slint dynamic UI overlays
- ROADMAP: strike through shipped items (AV1 support, compositor UI,
  multi-video compositing) and update section titles
- introduction.mdx: add video compositing, AV1 codecs, and Slint to
  key features and use cases
- architecture/overview.md: mention compositor GPU/CPU backends and
  Slint plugin in extensibility section
- installation.md: add AV1 build prerequisites (SVT-AV1, dav1d)
- reference/nodes/index.md: add feature-gated AV1 nodes section
- guides/web-ui.md: mention compositor scene editor in Monitor View
- deployment/gpu.md: add compositor gpu_mode documentation (auto/gpu/cpu)

Signed-off-by: StreamKit Devin &lt;devin@streamkit.dev&gt;
Co-authored-by: StreamKit Devin &lt;devin@streamkit.dev&gt;
Co-authored-by: Claudio Costa &lt;cstcld91@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -67,7 +67,8 @@ If you try it and something feels off, please open an issue (or a small PR). For
 - **Speech pipelines** — Build a transcription service: ingest audio via MoQ, run Whisper STT, stream transcription updates to clients.
 - **Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki translation models.
 - **Voice agents** — TTS-powered bots that respond to audio input with Kokoro, Piper, or Matcha.
-- **Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor, encoded with VP9 for real-time transport.
+- **Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor (CPU or GPU via wgpu), encoded with VP9 or AV1 for real-time transport.
+- **Dynamic UI overlays** — Render scriptable, data-driven overlays (scoreboards, lower thirds, watermarks) using the Slint plugin and composite them into live video.
 - **Audio processing** — Mixing, gain control, format conversion, and custom routing.
 - **Batch processing** — High-throughput file conversion or offline transcription using the Oneshot HTTP API.
 - **Your idea** — Add your own node or plugin and compose it into a pipeline
@@ -80,7 +81,7 @@ If you try it and something feels off, please open an issue (or a small PR). For
   - **Dynamic**: long-running sessions you can inspect and reconfigure while they run
 - **Transport**: real-time media over MoQ/WebTransport (QUIC) plus a WebSocket control plane for UI and automation (WebSocket transport nodes are on the roadmap; in the near term, non-media streams may also ride MoQ)
 - **Plugins**: native (C ABI, in-process) and WASM (Component Model).
-- **Media focus**: audio (Opus, WAV, OGG, FLAC, MP3) and basic video (VP9 encode/decode, compositing, WebM muxing). Video capabilities are expanding — see the [roadmap](ROADMAP.md).
+- **Media focus**: audio (Opus, WAV, OGG, FLAC, MP3) and video (VP9, AV1 encode/decode, compositing with CPU/GPU backends, WebM muxing). See the [roadmap](ROADMAP.md) for what's next.
 
 ## Quickstart (Docker)
 
@@ -168,7 +169,8 @@ docker run --rm --env-file streamkit.env \
 - Node graph model (DAG) with built-in nodes plus modular extensions
 - Web UI for building/inspecting pipelines and a client CLI (`skit-cli`) for scripting (included in GitHub Release tarballs)
 - Load testing + observability building blocks (see `samples/loadtest/` and `samples/grafana-dashboard.json`)
-- Optional ML plugins + models (mounted externally by default): Whisper/SenseVoice (STT), Kokoro/Piper/Matcha (TTS), NLLB/Helsinki (translation). Some models may have restrictive licenses (e.g. NLLB is CC-BY-NC); review model licenses before production use.
+- Video compositing with CPU (tiny-skia) and GPU (wgpu) backends, VP9 and AV1 codec support, and a dedicated compositor UI in the web interface
+- Optional ML plugins + models (mounted externally by default): Whisper/SenseVoice (STT), Kokoro/Piper/Matcha (TTS), NLLB/Helsinki (translation), Slint (dynamic UI overlays). Some models may have restrictive licenses (e.g. NLLB is CC-BY-NC); review model licenses before production use.
 
 ## Development
 
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -58,10 +58,11 @@ These are in place today and will be iterated on (not “added from scratch”):
 - **A/V sync** — Jitter/drift strategy, drop/late-frame policy, and regression tests (dynamic pipelines)
 - **Hang/MoQ alignment** — Clear mapping between StreamKit timing metadata and Hang/MoQ timestamps/groups
 
-### Dynamic Video over MoQ (VP9 MVP) (P0)
+### Dynamic Video over MoQ (P0)
 
 - ~~**Video packet types** — First-class video packets alongside audio, with explicit timing requirements~~
-- ~~**VP9 baseline** — Real-time VP9 encode/decode path suitable for browser clients; **AV1 optional later**~~
+- ~~**VP9 baseline** — Real-time VP9 encode/decode path suitable for browser clients~~
+- ~~**AV1 support** — Real-time AV1 encode/decode via rav1e/rav1d (pure Rust) and SVT-AV1/dav1d (C FFI); drop-in replacement for VP9 in any pipeline~~
 - **MoQ/Hang-first interop** — Start by interoperating cleanly with `@moq/hang`, then generalize to “MoQ in general”
 - ~~**Compositor MVP (main + PiP)** — Two live video inputs → one composed output, plus simple overlays (watermark/text/images)~~
 - **Golden-path demo** — A canonical “screen share + webcam → PiP → watchers” dynamic pipeline sample
@@ -108,7 +109,7 @@ These are in place today and will be iterated on (not “added from scratch”):
 
 - **TypeScript support in script nodes** — Compile `.ts` scripts at load time for type-safe pipeline logic
 - **UI code editor** — In-browser JavaScript/TypeScript editor with syntax highlighting and validation
-- **Compositor UI (basic)** — Dedicated scene/layer editor for main + PiP positioning and simple overlays (crop/transform/watermark)
+- ~~**Compositor UI (basic)** — Dedicated scene/layer editor for main + PiP positioning and simple overlays (crop/transform/watermark)~~
 - **Admin/Manage section** — Dedicated UI area for plugins, permissions/roles, secrets/config, and operational controls (separate from pipeline design/monitor views)
 
 ### Stability & Polish
@@ -139,9 +140,11 @@ StreamKit is media/processing-focused, not "audio-only". As real use cases emerg
 
 After the VP9 + compositor MVP is solid, expand video capabilities:
 
-- **More codecs/accelerators** — AV1, H.264, hardware acceleration options where possible
+- ~~**AV1 codecs** — AV1 encode/decode shipped (rav1e, SVT-AV1, rav1d, dav1d)~~
+- **More codecs/accelerators** — H.264, hardware acceleration options where possible
 - **Container support** — MP4 and WebM muxing with video tracks (beyond the initial WebM-focused PoC path)
-- **More compositing** — Multi-video compositing beyond PiP (layouts, grids, transitions)
+- ~~**Multi-video compositing** — N-input compositor with full per-layer transforms (position, scale, rotation, opacity, crop/zoom, z-order, mirror) shipped~~
+- **Compositing polish** — Pre-built layouts, grid templates, animated transitions
 
 ### Advanced Transports
 
diff --git a/docs/src/content/docs/architecture/overview.md b/docs/src/content/docs/architecture/overview.md
@@ -9,7 +9,7 @@ StreamKit has three major pieces:
 
 - **Server (`skit`)**: the Rust backend that runs pipelines and serves the web UI + APIs.
 - **Pipelines engine**: compiles YAML into a typed node graph (DAG) and executes it as Tokio tasks connected by bounded channels.
-- **Web UI**: a React app for creating, running, and monitoring pipelines in real time.
+- **Web UI**: a React app for creating, running, and monitoring pipelines in real time, with a dedicated compositor scene editor for video layouts.
 
 ## Execution surfaces
 
@@ -19,8 +19,8 @@ StreamKit has three major pieces:
 
 ## Extensibility
 
-- **Built-in nodes** (core, audio, video, containers, transport).
-- **Plugins**: native (in-process C ABI) and WASM (sandboxed Component Model).
+- **Built-in nodes** (core, audio, video, containers, transport) — including a multi-layer video compositor with CPU (tiny-skia) and GPU (wgpu) backends.
+- **Plugins**: native (in-process C ABI) and WASM (sandboxed Component Model) — e.g. Slint for dynamic UI overlays, Whisper/SenseVoice for STT, Kokoro/Piper for TTS.
 - **Script node**: sandboxed JavaScript (QuickJS) for lightweight integration and text processing.
 
 Next:
diff --git a/docs/src/content/docs/deployment/gpu.md b/docs/src/content/docs/deployment/gpu.md
@@ -2,9 +2,34 @@
 # SPDX-FileCopyrightText: © 2025 StreamKit Contributors
 # SPDX-License-Identifier: MPL-2.0
 title: GPU Setup
-description: Configure GPU acceleration for ML workloads
+description: Configure GPU acceleration for compositing and ML workloads
 ---
 
+StreamKit can use GPUs in two ways: the built-in video compositor uses **wgpu** (Vulkan/Metal/DX12 — works with any compatible GPU), and selected native ML plugins use **NVIDIA CUDA** for inference acceleration.
+
+## Compositor (GPU compositing)
+
+The built-in `video::compositor` node supports GPU-accelerated compositing via wgpu. Set `gpu_mode` in the compositor params:
+
+| Value  | Behaviour |
+|--------|-----------|
+| `auto` (default) | CPU for simple scenes, auto-promotes to GPU when complexity warrants it |
+| `gpu`  | Force GPU compositing (falls back to CPU if no GPU is available) |
+| `cpu`  | Force CPU-only compositing (tiny-skia) |
+
+No NVIDIA-specific tooling is required for compositor GPU — wgpu works with any Vulkan-capable GPU. Example:
+
+```yaml
+compositor:
+  kind: video::compositor
+  params:
+    width: 1280
+    height: 720
+    gpu_mode: auto  # or "gpu" / "cpu"
+```
+
+## ML Plugins (NVIDIA GPU)
+
 StreamKit can use NVIDIA GPUs for selected native ML plugins. GPU support depends on how you build and deploy:
 
 - `ghcr.io/streamer45/streamkit:latest` (CPU) runs plugins on CPU.
diff --git a/docs/src/content/docs/getting-started/installation.md b/docs/src/content/docs/getting-started/installation.md
@@ -45,6 +45,8 @@ Optional:
 - `reuse` (`pip3 install --user reuse`) for SPDX license header checks in `just lint` (note: the apt package is too old)
 - `clang` and `libclang-dev` (`sudo apt install clang libclang-dev`) for building native ML plugins (e.g. whisper, sensevoice)
 - `libvpx` + `pkg-config` if building with `--features video` (VP9 nodes)
+- `cmake` + `nasm` + C compiler if building with `--features svt_av1_static` (SVT-AV1 encoder); see [`crates/nodes/SVT_AV1.md`](https://github.com/streamer45/streamkit/blob/main/crates/nodes/SVT_AV1.md) for details
+- `libdav1d-dev` if building with `--features dav1d` (C dav1d AV1 decoder); the pure-Rust rav1d decoder (`--features av1`) requires no extra deps
 
 ### Build Steps
 
diff --git a/docs/src/content/docs/getting-started/introduction.mdx b/docs/src/content/docs/getting-started/introduction.mdx
@@ -26,6 +26,8 @@ StreamKit is built for developers who need to process real-time media — whethe
 - **Live transcription** — Ingest audio via MoQ, run Whisper or SenseVoice STT, stream transcription updates to clients
 - **Voice agents** — TTS-powered bots using Kokoro, Piper, or Matcha that respond to audio input
 - **Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki models
+- **Video compositing** — Combine camera feeds, PiP layouts, and dynamic overlays (scoreboards, watermarks) using the built-in compositor
+- **Dynamic UI overlays** — Render scriptable, data-driven graphics with the Slint plugin and composite them into live video
 - **Audio processing** — Mixing, gain control, format conversion, encoding/decoding pipelines
 - **Content analysis** — VAD for speech detection, keyword spotting, or custom safety filters
 - **Your idea** — Add your own node or plugin and compose it into a pipeline
diff --git a/docs/src/content/docs/guides/web-ui.md b/docs/src/content/docs/guides/web-ui.md
@@ -66,7 +66,7 @@ A fragment is a reusable mini-graph (a small, pre-wired set of nodes) that you c
 Monitor View uses the same overall three-pane layout, but focuses on running sessions:
 
 - **Left pane**: a live list of sessions until you enter **Staging Mode** (then it switches to the node library/palette for editing).
-- **Center pane**: the session graph view.
+- **Center pane**: the session graph view. If the selected session contains a `video::compositor` node, a **compositor scene editor** is available — an interactive canvas where you can drag, resize, reorder, and configure video layers and overlays in real time.
 - **Right pane** (once a session is selected): the YAML editor plus the Inspector pane for selected nodes.
 
 ## Convert View
diff --git a/docs/src/content/docs/reference/nodes/index.md b/docs/src/content/docs/reference/nodes/index.md
@@ -69,3 +69,12 @@ Notes:
 - [`video::pixel_convert`](./video-pixel-convert/)
 - [`video::vp9::decoder`](./video-vp9-decoder/)
 - [`video::vp9::encoder`](./video-vp9-encoder/)
+
+### Feature-gated video nodes
+
+These nodes require optional Cargo features and are not included in the default build:
+
+- `video::av1::encoder` — rav1e AV1 encoder (feature: `av1`)
+- `video::av1::decoder` — rav1d AV1 decoder (feature: `av1`)
+- `video::svt_av1::encoder` — SVT-AV1 encoder via FFI (feature: `svt_av1` or `svt_av1_static`)
+- `video::dav1d::decoder` — C dav1d AV1 decoder via FFI (feature: `dav1d`)