Skip to content

Commit 9fbc4b0

Browse files
2 parents 824ecd1 + ef2c192 commit 9fbc4b0

File tree

6 files changed

+15
-11
lines changed

6 files changed

+15
-11
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ SPDX-License-Identifier: MPL-2.0
1212
<br>
1313
</h1>
1414
<h4 align="center">Build and run real-time media pipelines on your own infrastructure</h4>
15-
<p align="center"><em>Speech-to-text, voice agents, live audio processing — composable, observable, self-hosted.</em></p>
15+
<p align="center"><em>Speech-to-text, voice agents, live audio/video processing — composable, observable, self-hosted.</em></p>
1616
<p align="center">
1717
<a href="https://streamkit.dev"><img src="https://img.shields.io/badge/docs-streamkit.dev-blue?style=flat-square" alt="Documentation"></a>
1818
<a href="https://demo.streamkit.dev"><img src="https://img.shields.io/badge/demo-try%20it%20live-brightgreen?style=flat-square" alt="Live Demo"></a>
@@ -24,7 +24,7 @@ SPDX-License-Identifier: MPL-2.0
2424
<p align="center">
2525
<img src="docs/public/screenshots/monitor_view.png" alt="StreamKit web UI (Monitor View): visual pipeline editor" width="800">
2626
<br>
27-
<em>Pipeline monitor showing real-time audio processing with node metrics</em>
27+
<em>Pipeline monitor showing real-time media processing with node metrics</em>
2828
</p>
2929

3030
**StreamKit** is a self-hostable media processing server (written in Rust). You run a single binary (`skit`), then compose pipelines as a node graph (DAG) made from built-in nodes, plugins, and scriptable logic — via a web UI, YAML, or API.
@@ -67,6 +67,7 @@ If you try it and something feels off, please open an issue (or a small PR). For
6767
- **Speech pipelines** — Build a transcription service: ingest audio via MoQ, run Whisper STT, stream transcription updates to clients.
6868
- **Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki translation models.
6969
- **Voice agents** — TTS-powered bots that respond to audio input with Kokoro, Piper, or Matcha.
70+
- **Video compositing** — Combine camera feeds with overlays and PiP layouts using the built-in compositor, encoded with VP9 for real-time transport.
7071
- **Audio processing** — Mixing, gain control, format conversion, and custom routing.
7172
- **Batch processing** — High-throughput file conversion or offline transcription using the Oneshot HTTP API.
7273
- **Your idea** — Add your own node or plugin and compose it into a pipeline
@@ -79,7 +80,7 @@ If you try it and something feels off, please open an issue (or a small PR). For
7980
- **Dynamic**: long-running sessions you can inspect and reconfigure while they run
8081
- **Transport**: real-time media over MoQ/WebTransport (QUIC) plus a WebSocket control plane for UI and automation (WebSocket transport nodes are on the roadmap; in the near term, non-media streams may also ride MoQ)
8182
- **Plugins**: native (C ABI, in-process) and WASM (Component Model).
82-
- **Media focus**: audio-first today (Opus, WAV, OGG, FLAC, MP3). Video support is on the [roadmap](ROADMAP.md).
83+
- **Media focus**: audio (Opus, WAV, OGG, FLAC, MP3) and basic video (VP9 encode/decode, compositing, WebM muxing). Video capabilities are expanding — see the [roadmap](ROADMAP.md).
8384

8485
## Quickstart (Docker)
8586

ROADMAP.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,10 @@ These are in place today and will be iterated on (not “added from scratch”):
6060

6161
### Dynamic Video over MoQ (VP9 MVP) (P0)
6262

63-
- **Video packet types** — First-class video packets alongside audio, with explicit timing requirements
64-
- **VP9 baseline** — Real-time VP9 encode/decode path suitable for browser clients; **AV1 optional later**
63+
- ~~**Video packet types** — First-class video packets alongside audio, with explicit timing requirements~~
64+
- ~~**VP9 baseline** — Real-time VP9 encode/decode path suitable for browser clients; **AV1 optional later**~~
6565
- **MoQ/Hang-first interop** — Start by interoperating cleanly with `@moq/hang`, then generalize to “MoQ in general”
66-
- **Compositor MVP (main + PiP)** — Two live video inputs → one composed output, plus simple overlays (watermark/text/images)
66+
- ~~**Compositor MVP (main + PiP)** — Two live video inputs → one composed output, plus simple overlays (watermark/text/images)~~
6767
- **Golden-path demo** — A canonical “screen share + webcam → PiP → watchers” dynamic pipeline sample
6868

6969
### Reliability & Developer Experience

crates/nodes/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Built-in processing nodes for StreamKit pipelines.
1010

1111
## What Lives Here
1212

13-
- Built-in node implementations (e.g. `core::*`, `audio::*`, `containers::*`, `transport::*`)
13+
- Built-in node implementations (e.g. `core::*`, `audio::*`, `video::*`, `containers::*`, `transport::*`)
1414
- Node parameter schemas (used by the UI for validation and editor controls)
1515
- Node-level tests and fixtures
1616

docs/src/content/docs/architecture/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ StreamKit has three major pieces:
1919

2020
## Extensibility
2121

22-
- **Built-in nodes** (core, audio, containers, transport).
22+
- **Built-in nodes** (core, audio, video, containers, transport).
2323
- **Plugins**: native (in-process C ABI) and WASM (sandboxed Component Model).
2424
- **Script node**: sandboxed JavaScript (QuickJS) for lightweight integration and text processing.
2525

docs/src/content/docs/guides/performance.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ moq_peer_channel_capacity = 100 # (MoQ builds) MoQ transport internal queues (p
122122
| `demuxer_buffer_size` | 65536 | OGG demuxer duplex buffer (bytes) |
123123
| `moq_peer_channel_capacity` | 100 | (MoQ builds) MoQ peer internal channels (packets) |
124124

125-
**Warning**: Only modify these if you understand the latency/throughput implications. The defaults are tuned for typical real-time audio processing workloads.
125+
**Warning**: Only modify these if you understand the latency/throughput implications. The defaults are tuned for typical real-time audio/video processing workloads.
126126

127127
### When to Adjust
128128

@@ -148,6 +148,8 @@ The core audio frame pool is preallocated with fixed defaults and cannot be conf
148148

149149
These are optimized for common audio frame sizes (10-80ms at 48kHz) and should not need adjustment.
150150

151+
A separate video frame pool (`VideoFramePool`) manages reusable byte buffers for raw video frames, reducing per-frame allocation overhead in video pipelines.
152+
151153
## Complete Example
152154

153155
```toml

docs/src/content/docs/index.mdx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ title: StreamKit
55
description: Open-source real-time media processing engine
66
template: splash
77
hero:
8-
tagline: Build and run real-time media pipelines on your own infrastructure. Speech-to-text, voice agents, live audio processing — composable, observable, self-hosted.
8+
tagline: Build and run real-time media pipelines on your own infrastructure. Speech-to-text, voice agents, live audio/video processing — composable, observable, self-hosted.
99
actions:
1010
- text: Get Started
1111
link: /getting-started/quick-start/
@@ -54,14 +54,15 @@ import { Card, CardGrid } from '@astrojs/starlight/components';
5454

5555
## Who is this for?
5656

57-
StreamKit is built for developers who need to process real-time media — whether you're building voice features for an app, prototyping an AI audio pipeline, or self-hosting alternatives to cloud speech APIs.
57+
StreamKit is built for developers who need to process real-time media — whether you're building voice features, prototyping an AI audio/video pipeline, or self-hosting alternatives to cloud speech APIs.
5858

5959
## What you can build
6060

6161
- **Live transcription** — Ingest audio via MoQ, run Whisper or SenseVoice STT, stream transcription updates to clients
6262
- **Voice agents** — TTS-powered bots using Kokoro, Piper, or Matcha that respond to audio input
6363
- **Real-time translation** — Bilingual streams with live subtitles using NLLB or Helsinki models
6464
- **Audio processing** — Mixing, gain control, format conversion, encoding/decoding pipelines
65+
- **Video compositing** — Combine live video inputs with text/image overlays using the built-in compositor (PiP, z-ordering, crop/zoom), encoded via VP9 for real-time transport
6566
- **Content analysis** — VAD for speech detection, keyword spotting, or custom safety filters
6667

6768
## What it is

0 commit comments

Comments
 (0)