Skip to content

Commit 2b91476

Browse files
feat(pipeline): add VAD filtering and telemetry_out to subtitle pipeline
Add Silero VAD configuration to the Whisper node (vad_threshold: 0.4, min_silence_duration_ms: 600) so silence is filtered before inference, improving transcription responsiveness. Replace telemetry_tap with core::telemetry_out (matching other dynamic pipelines like voice-agent-openai and speech-translate) to surface STT results in the stream view telemetry timeline via best_effort side branch. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
1 parent 8044c9d commit 2b91476

File tree

1 file changed

+18
-6
lines changed

1 file changed

+18
-6
lines changed

samples/pipelines/dynamic/video_moq_webcam_subtitles.yml

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,15 @@
66
# transcription rendered as a Slint subtitle overlay.
77
#
88
# Data flow (subtitles):
9-
# mic → opus_decoder → resampler → whisper → stt_tap → [best_effort] →
9+
# mic → opus_decoder → resampler → whisper → [best_effort] →
1010
# param_bridge → UpdateParams → slint subtitle → compositor layer
1111
#
12+
# whisper → [best_effort] → telemetry_out (stream view)
13+
#
1214
# Requires:
1315
# - plugin::native::whisper loaded (with a model, e.g. tiny.en)
1416
# - plugin::native::slint loaded
17+
# - Silero VAD model: models/silero_vad.onnx
1518
# just build-plugin-native-whisper && just build-plugin-native-slint && just copy-plugins-native
1619

1720
name: Webcam PiP + Live Subtitles (MoQ)
@@ -77,14 +80,23 @@ nodes:
7780
params:
7881
model_path: models/ggml-tiny.en-q5_1.bin
7982
language: en
83+
vad_model_path: models/silero_vad.onnx
84+
vad_threshold: 0.4
85+
min_silence_duration_ms: 600
86+
max_segment_duration_secs: 30.0
87+
emit_vad_events: true
88+
n_threads: 0
8089
needs: resampler
8190

82-
# Tap whisper output so transcriptions appear in the stream view telemetry.
83-
stt_tap:
84-
kind: core::telemetry_tap
91+
# Surface STT results in the stream view telemetry timeline.
92+
stt_telemetry:
93+
kind: core::telemetry_out
8594
params:
8695
packet_types: ["Transcription"]
87-
needs: whisper
96+
max_events_per_sec: 20
97+
needs:
98+
node: whisper
99+
mode: best_effort
88100

89101
# --- Subtitle rendering (Slint) ---
90102

@@ -110,7 +122,7 @@ nodes:
110122
show: true
111123
debounce_ms: 100
112124
needs:
113-
in: stt_tap
125+
in: whisper
114126
connection_mode: best_effort
115127

116128
# --- Video decode + compositing ---

0 commit comments

Comments
 (0)