Skip to content

Commit 79c2435

Browse files
staging-devin-ai-integration[bot]streamkit-devinstreamer45
authored
docs: update README, ROADMAP, and docs for AV1, compositor, and Slint (#250)
Reflect shipped features across documentation: - README: update use cases, media focus, and what's-included to mention AV1 codecs, GPU compositing, and Slint dynamic UI overlays - ROADMAP: strike through shipped items (AV1 support, compositor UI, multi-video compositing) and update section titles - introduction.mdx: add video compositing, AV1 codecs, and Slint to key features and use cases - architecture/overview.md: mention compositor GPU/CPU backends and Slint plugin in extensibility section - installation.md: add AV1 build prerequisites (SVT-AV1, dav1d) - reference/nodes/index.md: add feature-gated AV1 nodes section - guides/web-ui.md: mention compositor scene editor in Monitor View - deployment/gpu.md: add compositor gpu_mode documentation (auto/gpu/cpu) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: StreamKit Devin <devin@streamkit.dev> Co-authored-by: Claudio Costa <cstcld91@gmail.com>
1 parent 43af105 commit 79c2435

File tree

8 files changed

+56
-13
lines changed

8 files changed

+56
-13
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,8 @@ If you try it and something feels off, please open an issue (or a small PR). For
6767
- **Speech pipelines** — Build a transcription service: ingest audio via MoQ, run Whisper STT, stream transcription updates to clients.
6868
- **Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki translation models.
6969
- **Voice agents** — TTS-powered bots that respond to audio input with Kokoro, Piper, or Matcha.
70-
- **Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor, encoded with VP9 for real-time transport.
70+
- **Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor (CPU or GPU via wgpu), encoded with VP9 or AV1 for real-time transport.
71+
- **Dynamic UI overlays** — Render scriptable, data-driven overlays (scoreboards, lower thirds, watermarks) using the Slint plugin and composite them into live video.
7172
- **Audio processing** — Mixing, gain control, format conversion, and custom routing.
7273
- **Batch processing** — High-throughput file conversion or offline transcription using the Oneshot HTTP API.
7374
- **Your idea** — Add your own node or plugin and compose it into a pipeline
@@ -80,7 +81,7 @@ If you try it and something feels off, please open an issue (or a small PR). For
8081
- **Dynamic**: long-running sessions you can inspect and reconfigure while they run
8182
- **Transport**: real-time media over MoQ/WebTransport (QUIC) plus a WebSocket control plane for UI and automation (WebSocket transport nodes are on the roadmap; in the near term, non-media streams may also ride MoQ)
8283
- **Plugins**: native (C ABI, in-process) and WASM (Component Model).
83-
- **Media focus**: audio (Opus, WAV, OGG, FLAC, MP3) and basic video (VP9 encode/decode, compositing, WebM muxing). Video capabilities are expanding — see the [roadmap](ROADMAP.md).
84+
- **Media focus**: audio (Opus, WAV, OGG, FLAC, MP3) and video (VP9, AV1 encode/decode, compositing with CPU/GPU backends, WebM muxing). See the [roadmap](ROADMAP.md) for what's next.
8485

8586
## Quickstart (Docker)
8687

@@ -168,7 +169,8 @@ docker run --rm --env-file streamkit.env \
168169
- Node graph model (DAG) with built-in nodes plus modular extensions
169170
- Web UI for building/inspecting pipelines and a client CLI (`skit-cli`) for scripting (included in GitHub Release tarballs)
170171
- Load testing + observability building blocks (see `samples/loadtest/` and `samples/grafana-dashboard.json`)
171-
- Optional ML plugins + models (mounted externally by default): Whisper/SenseVoice (STT), Kokoro/Piper/Matcha (TTS), NLLB/Helsinki (translation). Some models may have restrictive licenses (e.g. NLLB is CC-BY-NC); review model licenses before production use.
172+
- Video compositing with CPU (tiny-skia) and GPU (wgpu) backends, VP9 and AV1 codec support, and a dedicated compositor UI in the web interface
173+
- Optional ML plugins + models (mounted externally by default): Whisper/SenseVoice (STT), Kokoro/Piper/Matcha (TTS), NLLB/Helsinki (translation), Slint (dynamic UI overlays). Some models may have restrictive licenses (e.g. NLLB is CC-BY-NC); review model licenses before production use.
172174

173175
## Development
174176

ROADMAP.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,11 @@ These are in place today and will be iterated on (not “added from scratch”):
5858
- **A/V sync** — Jitter/drift strategy, drop/late-frame policy, and regression tests (dynamic pipelines)
5959
- **Hang/MoQ alignment** — Clear mapping between StreamKit timing metadata and Hang/MoQ timestamps/groups
6060

61-
### Dynamic Video over MoQ (VP9 MVP) (P0)
61+
### Dynamic Video over MoQ (P0)
6262

6363
- ~~**Video packet types** — First-class video packets alongside audio, with explicit timing requirements~~
64-
- ~~**VP9 baseline** — Real-time VP9 encode/decode path suitable for browser clients; **AV1 optional later**~~
64+
- ~~**VP9 baseline** — Real-time VP9 encode/decode path suitable for browser clients~~
65+
- ~~**AV1 support** — Real-time AV1 encode/decode via rav1e/rav1d (pure Rust) and SVT-AV1/dav1d (C FFI); drop-in replacement for VP9 in any pipeline~~
6566
- **MoQ/Hang-first interop** — Start by interoperating cleanly with `@moq/hang`, then generalize to “MoQ in general”
6667
- ~~**Compositor MVP (main + PiP)** — Two live video inputs → one composed output, plus simple overlays (watermark/text/images)~~
6768
- **Golden-path demo** — A canonical “screen share + webcam → PiP → watchers” dynamic pipeline sample
@@ -108,7 +109,7 @@ These are in place today and will be iterated on (not “added from scratch”):
108109

109110
- **TypeScript support in script nodes** — Compile `.ts` scripts at load time for type-safe pipeline logic
110111
- **UI code editor** — In-browser JavaScript/TypeScript editor with syntax highlighting and validation
111-
- **Compositor UI (basic)** — Dedicated scene/layer editor for main + PiP positioning and simple overlays (crop/transform/watermark)
112+
- ~~**Compositor UI (basic)** — Dedicated scene/layer editor for main + PiP positioning and simple overlays (crop/transform/watermark)~~
112113
- **Admin/Manage section** — Dedicated UI area for plugins, permissions/roles, secrets/config, and operational controls (separate from pipeline design/monitor views)
113114

114115
### Stability & Polish
@@ -139,9 +140,11 @@ StreamKit is media/processing-focused, not "audio-only". As real use cases emerg
139140

140141
After the VP9 + compositor MVP is solid, expand video capabilities:
141142

142-
- **More codecs/accelerators** — AV1, H.264, hardware acceleration options where possible
143+
- ~~**AV1 codecs** — AV1 encode/decode shipped (rav1e, SVT-AV1, rav1d, dav1d)~~
144+
- **More codecs/accelerators** — H.264, hardware acceleration options where possible
143145
- **Container support** — MP4 and WebM muxing with video tracks (beyond the initial WebM-focused PoC path)
144-
- **More compositing** — Multi-video compositing beyond PiP (layouts, grids, transitions)
146+
- ~~**Multi-video compositing** — N-input compositor with full per-layer transforms (position, scale, rotation, opacity, crop/zoom, z-order, mirror) shipped~~
147+
- **Compositing polish** — Pre-built layouts, grid templates, animated transitions
145148

146149
### Advanced Transports
147150

docs/src/content/docs/architecture/overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ StreamKit has three major pieces:
99

1010
- **Server (`skit`)**: the Rust backend that runs pipelines and serves the web UI + APIs.
1111
- **Pipelines engine**: compiles YAML into a typed node graph (DAG) and executes it as Tokio tasks connected by bounded channels.
12-
- **Web UI**: a React app for creating, running, and monitoring pipelines in real time.
12+
- **Web UI**: a React app for creating, running, and monitoring pipelines in real time, with a dedicated compositor scene editor for video layouts.
1313

1414
## Execution surfaces
1515

@@ -19,8 +19,8 @@ StreamKit has three major pieces:
1919

2020
## Extensibility
2121

22-
- **Built-in nodes** (core, audio, video, containers, transport).
23-
- **Plugins**: native (in-process C ABI) and WASM (sandboxed Component Model).
22+
- **Built-in nodes** (core, audio, video, containers, transport) — including a multi-layer video compositor with CPU (tiny-skia) and GPU (wgpu) backends.
23+
- **Plugins**: native (in-process C ABI) and WASM (sandboxed Component Model) — e.g. Slint for dynamic UI overlays, Whisper/SenseVoice for STT, Kokoro/Piper for TTS.
2424
- **Script node**: sandboxed JavaScript (QuickJS) for lightweight integration and text processing.
2525

2626
Next:

docs/src/content/docs/deployment/gpu.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,34 @@
22
# SPDX-FileCopyrightText: © 2025 StreamKit Contributors
33
# SPDX-License-Identifier: MPL-2.0
44
title: GPU Setup
5-
description: Configure GPU acceleration for ML workloads
5+
description: Configure GPU acceleration for compositing and ML workloads
66
---
77

8+
StreamKit can use GPUs in two ways: the built-in video compositor uses **wgpu** (Vulkan/Metal/DX12 — works with any compatible GPU), and selected native ML plugins use **NVIDIA CUDA** for inference acceleration.
9+
10+
## Compositor (GPU compositing)
11+
12+
The built-in `video::compositor` node supports GPU-accelerated compositing via wgpu. Set `gpu_mode` in the compositor params:
13+
14+
| Value | Behaviour |
15+
|--------|-----------|
16+
| `auto` (default) | CPU for simple scenes, auto-promotes to GPU when complexity warrants it |
17+
| `gpu` | Force GPU compositing (falls back to CPU if no GPU is available) |
18+
| `cpu` | Force CPU-only compositing (tiny-skia) |
19+
20+
No NVIDIA-specific tooling is required for compositor GPU — wgpu works with any Vulkan-capable GPU. Example:
21+
22+
```yaml
23+
compositor:
24+
kind: video::compositor
25+
params:
26+
width: 1280
27+
height: 720
28+
gpu_mode: auto # or "gpu" / "cpu"
29+
```
30+
31+
## ML Plugins (NVIDIA GPU)
32+
833
StreamKit can use NVIDIA GPUs for selected native ML plugins. GPU support depends on how you build and deploy:
934
1035
- `ghcr.io/streamer45/streamkit:latest` (CPU) runs plugins on CPU.

docs/src/content/docs/getting-started/installation.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ Optional:
4545
- `reuse` (`pip3 install --user reuse`) for SPDX license header checks in `just lint` (note: the apt package is too old)
4646
- `clang` and `libclang-dev` (`sudo apt install clang libclang-dev`) for building native ML plugins (e.g. whisper, sensevoice)
4747
- `libvpx` + `pkg-config` if building with `--features video` (VP9 nodes)
48+
- `cmake` + `nasm` + C compiler if building with `--features svt_av1_static` (SVT-AV1 encoder); see [`crates/nodes/SVT_AV1.md`](https://github.com/streamer45/streamkit/blob/main/crates/nodes/SVT_AV1.md) for details
49+
- `libdav1d-dev` if building with `--features dav1d` (C dav1d AV1 decoder); the pure-Rust rav1d decoder (`--features av1`) requires no extra deps
4850

4951
### Build Steps
5052

docs/src/content/docs/getting-started/introduction.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ StreamKit is built for developers who need to process real-time media — whethe
2626
- **Live transcription** — Ingest audio via MoQ, run Whisper or SenseVoice STT, stream transcription updates to clients
2727
- **Voice agents** — TTS-powered bots using Kokoro, Piper, or Matcha that respond to audio input
2828
- **Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki models
29+
- **Video compositing** — Combine camera feeds, PiP layouts, and dynamic overlays (scoreboards, watermarks) using the built-in compositor
30+
- **Dynamic UI overlays** — Render scriptable, data-driven graphics with the Slint plugin and composite them into live video
2931
- **Audio processing** — Mixing, gain control, format conversion, encoding/decoding pipelines
3032
- **Content analysis** — VAD for speech detection, keyword spotting, or custom safety filters
3133
- **Your idea** — Add your own node or plugin and compose it into a pipeline

docs/src/content/docs/guides/web-ui.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ A fragment is a reusable mini-graph (a small, pre-wired set of nodes) that you c
6666
Monitor View uses the same overall three-pane layout, but focuses on running sessions:
6767

6868
- **Left pane**: a live list of sessions until you enter **Staging Mode** (then it switches to the node library/palette for editing).
69-
- **Center pane**: the session graph view.
69+
- **Center pane**: the session graph view. If the selected session contains a `video::compositor` node, a **compositor scene editor** is available — an interactive canvas where you can drag, resize, reorder, and configure video layers and overlays in real time.
7070
- **Right pane** (once a session is selected): the YAML editor plus the Inspector pane for selected nodes.
7171

7272
## Convert View

docs/src/content/docs/reference/nodes/index.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,3 +69,12 @@ Notes:
6969
- [`video::pixel_convert`](./video-pixel-convert/)
7070
- [`video::vp9::decoder`](./video-vp9-decoder/)
7171
- [`video::vp9::encoder`](./video-vp9-encoder/)
72+
73+
### Feature-gated video nodes
74+
75+
These nodes require optional Cargo features and are not included in the default build:
76+
77+
- `video::av1::encoder` — rav1e AV1 encoder (feature: `av1`)
78+
- `video::av1::decoder` — rav1d AV1 decoder (feature: `av1`)
79+
- `video::svt_av1::encoder` — SVT-AV1 encoder via FFI (feature: `svt_av1` or `svt_av1_static`)
80+
- `video::dav1d::decoder` — C dav1d AV1 decoder via FFI (feature: `dav1d`)

0 commit comments

Comments
 (0)