You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<imgsrc="docs/public/screenshots/monitor_view.png"alt="StreamKit web UI (Monitor View): visual pipeline editor"width="800">
26
26
<br>
27
-
<em>Pipeline monitor showing real-time audio processing with node metrics</em>
27
+
<em>Pipeline monitor showing real-time media processing with node metrics</em>
28
28
</p>
29
29
30
30
**StreamKit** is a self-hostable media processing server (written in Rust). You run a single binary (`skit`), then compose pipelines as a node graph (DAG) made from built-in nodes, plugins, and scriptable logic — via a web UI, YAML, or API.
@@ -67,6 +67,7 @@ If you try it and something feels off, please open an issue (or a small PR). For
67
67
-**Speech pipelines** — Build a transcription service: ingest audio via MoQ, run Whisper STT, stream transcription updates to clients.
68
68
-**Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki translation models.
69
69
-**Voice agents** — TTS-powered bots that respond to audio input with Kokoro, Piper, or Matcha.
70
+
-**Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor, encoded with VP9 for real-time transport.
70
71
-**Audio processing** — Mixing, gain control, format conversion, and custom routing.
71
72
-**Batch processing** — High-throughput file conversion or offline transcription using the Oneshot HTTP API.
72
73
-**Your idea** — Add your own node or plugin and compose it into a pipeline
@@ -79,7 +80,7 @@ If you try it and something feels off, please open an issue (or a small PR). For
79
80
-**Dynamic**: long-running sessions you can inspect and reconfigure while they run
80
81
-**Transport**: real-time media over MoQ/WebTransport (QUIC) plus a WebSocket control plane for UI and automation (WebSocket transport nodes are on the roadmap; in the near term, non-media streams may also ride MoQ)
81
82
-**Plugins**: native (C ABI, in-process) and WASM (Component Model).
82
-
-**Media focus**: audio-first today (Opus, WAV, OGG, FLAC, MP3). Video support is on the [roadmap](ROADMAP.md).
83
+
-**Media focus**: audio(Opus, WAV, OGG, FLAC, MP3) and basic video (VP9 encode/decode, compositing, WebM muxing). Video capabilities are expanding — see the [roadmap](ROADMAP.md).
**Warning**: Only modify these if you understand the latency/throughput implications. The defaults are tuned for typical real-time audio processing workloads.
125
+
**Warning**: Only modify these if you understand the latency/throughput implications. The defaults are tuned for typical real-time audio/video processing workloads.
126
126
127
127
### When to Adjust
128
128
@@ -148,6 +148,8 @@ The core audio frame pool is preallocated with fixed defaults and cannot be conf
148
148
149
149
These are optimized for common audio frame sizes (10-80ms at 48kHz) and should not need adjustment.
150
150
151
+
A separate video frame pool (`VideoFramePool`) manages reusable byte buffers for raw video frames, reducing per-frame allocation overhead in video pipelines.
Copy file name to clipboardExpand all lines: docs/src/content/docs/index.mdx
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ title: StreamKit
5
5
description: Open-source real-time media processing engine
6
6
template: splash
7
7
hero:
8
-
tagline: Build and run real-time media pipelines on your own infrastructure. Speech-to-text, voice agents, live audio processing — composable, observable, self-hosted.
8
+
tagline: Build and run real-time media pipelines on your own infrastructure. Speech-to-text, voice agents, live audio/video processing — composable, observable, self-hosted.
StreamKit is built for developers who need to process real-time media — whether you're building voice features for an app, prototyping an AI audio pipeline, or self-hosting alternatives to cloud speech APIs.
57
+
StreamKit is built for developers who need to process real-time media — whether you're building voice features, prototyping an AI audio/video pipeline, or self-hosting alternatives to cloud speech APIs.
58
58
59
59
## What you can build
60
60
61
61
-**Live transcription** — Ingest audio via MoQ, run Whisper or SenseVoice STT, stream transcription updates to clients
62
62
-**Voice agents** — TTS-powered bots using Kokoro, Piper, or Matcha that respond to audio input
63
63
-**Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki models
64
64
-**Audio processing** — Mixing, gain control, format conversion, encoding/decoding pipelines
65
+
-**Video compositing** — Combine live video inputs with text/image overlays using the built-in compositor (PiP, z-ordering, crop/zoom), encoded via VP9 for real-time transport
65
66
-**Content analysis** — VAD for speech detection, keyword spotting, or custom safety filters
0 commit comments