You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update README, ROADMAP, and docs for AV1, compositor, and Slint (#250)
Reflect shipped features across documentation:
- README: update use cases, media focus, and what's-included to mention
AV1 codecs, GPU compositing, and Slint dynamic UI overlays
- ROADMAP: strike through shipped items (AV1 support, compositor UI,
multi-video compositing) and update section titles
- introduction.mdx: add video compositing, AV1 codecs, and Slint to
key features and use cases
- architecture/overview.md: mention compositor GPU/CPU backends and
Slint plugin in extensibility section
- installation.md: add AV1 build prerequisites (SVT-AV1, dav1d)
- reference/nodes/index.md: add feature-gated AV1 nodes section
- guides/web-ui.md: mention compositor scene editor in Monitor View
- deployment/gpu.md: add compositor gpu_mode documentation (auto/gpu/cpu)
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-authored-by: StreamKit Devin <devin@streamkit.dev>
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
Copy file name to clipboardExpand all lines: README.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,8 @@ If you try it and something feels off, please open an issue (or a small PR). For
67
67
-**Speech pipelines** — Build a transcription service: ingest audio via MoQ, run Whisper STT, stream transcription updates to clients.
68
68
-**Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki translation models.
69
69
-**Voice agents** — TTS-powered bots that respond to audio input with Kokoro, Piper, or Matcha.
70
-
-**Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor, encoded with VP9 for real-time transport.
70
+
-**Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor (CPU or GPU via wgpu), encoded with VP9 or AV1 for real-time transport.
71
+
-**Dynamic UI overlays** — Render scriptable, data-driven overlays (scoreboards, lower thirds, watermarks) using the Slint plugin and composite them into live video.
71
72
-**Audio processing** — Mixing, gain control, format conversion, and custom routing.
72
73
-**Batch processing** — High-throughput file conversion or offline transcription using the Oneshot HTTP API.
73
74
-**Your idea** — Add your own node or plugin and compose it into a pipeline
@@ -80,7 +81,7 @@ If you try it and something feels off, please open an issue (or a small PR). For
80
81
-**Dynamic**: long-running sessions you can inspect and reconfigure while they run
81
82
-**Transport**: real-time media over MoQ/WebTransport (QUIC) plus a WebSocket control plane for UI and automation (WebSocket transport nodes are on the roadmap; in the near term, non-media streams may also ride MoQ)
82
83
-**Plugins**: native (C ABI, in-process) and WASM (Component Model).
83
-
-**Media focus**: audio (Opus, WAV, OGG, FLAC, MP3) and basic video (VP9encode/decode, compositing, WebM muxing). Video capabilities are expanding — see the [roadmap](ROADMAP.md).
84
+
-**Media focus**: audio (Opus, WAV, OGG, FLAC, MP3) and video (VP9, AV1 encode/decode, compositing with CPU/GPU backends, WebM muxing). See the [roadmap](ROADMAP.md) for what's next.
84
85
85
86
## Quickstart (Docker)
86
87
@@ -168,7 +169,8 @@ docker run --rm --env-file streamkit.env \
168
169
- Node graph model (DAG) with built-in nodes plus modular extensions
169
170
- Web UI for building/inspecting pipelines and a client CLI (`skit-cli`) for scripting (included in GitHub Release tarballs)
170
171
- Load testing + observability building blocks (see `samples/loadtest/` and `samples/grafana-dashboard.json`)
171
-
- Optional ML plugins + models (mounted externally by default): Whisper/SenseVoice (STT), Kokoro/Piper/Matcha (TTS), NLLB/Helsinki (translation). Some models may have restrictive licenses (e.g. NLLB is CC-BY-NC); review model licenses before production use.
172
+
- Video compositing with CPU (tiny-skia) and GPU (wgpu) backends, VP9 and AV1 codec support, and a dedicated compositor UI in the web interface
173
+
- Optional ML plugins + models (mounted externally by default): Whisper/SenseVoice (STT), Kokoro/Piper/Matcha (TTS), NLLB/Helsinki (translation), Slint (dynamic UI overlays). Some models may have restrictive licenses (e.g. NLLB is CC-BY-NC); review model licenses before production use.
@@ -108,7 +109,7 @@ These are in place today and will be iterated on (not “added from scratch”):
108
109
109
110
-**TypeScript support in script nodes** — Compile `.ts` scripts at load time for type-safe pipeline logic
110
111
-**UI code editor** — In-browser JavaScript/TypeScript editor with syntax highlighting and validation
111
-
-**Compositor UI (basic)** — Dedicated scene/layer editor for main + PiP positioning and simple overlays (crop/transform/watermark)
112
+
-~~**Compositor UI (basic)** — Dedicated scene/layer editor for main + PiP positioning and simple overlays (crop/transform/watermark)~~
112
113
-**Admin/Manage section** — Dedicated UI area for plugins, permissions/roles, secrets/config, and operational controls (separate from pipeline design/monitor views)
113
114
114
115
### Stability & Polish
@@ -139,9 +140,11 @@ StreamKit is media/processing-focused, not "audio-only". As real use cases emerg
139
140
140
141
After the VP9 + compositor MVP is solid, expand video capabilities:
141
142
142
-
-**More codecs/accelerators** — AV1, H.264, hardware acceleration options where possible
-**Plugins**: native (in-process C ABI) and WASM (sandboxed Component Model).
22
+
-**Built-in nodes** (core, audio, video, containers, transport) — including a multi-layer video compositor with CPU (tiny-skia) and GPU (wgpu) backends.
23
+
-**Plugins**: native (in-process C ABI) and WASM (sandboxed Component Model) — e.g. Slint for dynamic UI overlays, Whisper/SenseVoice for STT, Kokoro/Piper for TTS.
24
24
-**Script node**: sandboxed JavaScript (QuickJS) for lightweight integration and text processing.
description: Configure GPU acceleration for ML workloads
5
+
description: Configure GPU acceleration for compositing and ML workloads
6
6
---
7
7
8
+
StreamKit can use GPUs in two ways: the built-in video compositor uses **wgpu** (Vulkan/Metal/DX12 — works with any compatible GPU), and selected native ML plugins use **NVIDIA CUDA** for inference acceleration.
9
+
10
+
## Compositor (GPU compositing)
11
+
12
+
The built-in `video::compositor` node supports GPU-accelerated compositing via wgpu. Set `gpu_mode` in the compositor params:
13
+
14
+
| Value | Behaviour |
15
+
|--------|-----------|
16
+
|`auto` (default) | CPU for simple scenes, auto-promotes to GPU when complexity warrants it |
17
+
|`gpu`| Force GPU compositing (falls back to CPU if no GPU is available) |
18
+
|`cpu`| Force CPU-only compositing (tiny-skia) |
19
+
20
+
No NVIDIA-specific tooling is required for compositor GPU — wgpu works with any Vulkan-capable GPU. Example:
21
+
22
+
```yaml
23
+
compositor:
24
+
kind: video::compositor
25
+
params:
26
+
width: 1280
27
+
height: 720
28
+
gpu_mode: auto # or "gpu" / "cpu"
29
+
```
30
+
31
+
## ML Plugins (NVIDIA GPU)
32
+
8
33
StreamKit can use NVIDIA GPUs for selected native ML plugins. GPU support depends on how you build and deploy:
9
34
10
35
- `ghcr.io/streamer45/streamkit:latest` (CPU) runs plugins on CPU.
Copy file name to clipboardExpand all lines: docs/src/content/docs/getting-started/installation.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,8 @@ Optional:
45
45
-`reuse` (`pip3 install --user reuse`) for SPDX license header checks in `just lint` (note: the apt package is too old)
46
46
-`clang` and `libclang-dev` (`sudo apt install clang libclang-dev`) for building native ML plugins (e.g. whisper, sensevoice)
47
47
-`libvpx` + `pkg-config` if building with `--features video` (VP9 nodes)
48
+
-`cmake` + `nasm` + C compiler if building with `--features svt_av1_static` (SVT-AV1 encoder); see [`crates/nodes/SVT_AV1.md`](https://github.com/streamer45/streamkit/blob/main/crates/nodes/SVT_AV1.md) for details
49
+
-`libdav1d-dev` if building with `--features dav1d` (C dav1d AV1 decoder); the pure-Rust rav1d decoder (`--features av1`) requires no extra deps
Copy file name to clipboardExpand all lines: docs/src/content/docs/guides/web-ui.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@ A fragment is a reusable mini-graph (a small, pre-wired set of nodes) that you c
66
66
Monitor View uses the same overall three-pane layout, but focuses on running sessions:
67
67
68
68
-**Left pane**: a live list of sessions until you enter **Staging Mode** (then it switches to the node library/palette for editing).
69
-
-**Center pane**: the session graph view.
69
+
-**Center pane**: the session graph view. If the selected session contains a `video::compositor` node, a **compositor scene editor** is available — an interactive canvas where you can drag, resize, reorder, and configure video layers and overlays in real time.
70
70
-**Right pane** (once a session is selected): the YAML editor plus the Inspector pane for selected nodes.
0 commit comments