-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
component: voiceVoice mode (STT/TTS pipeline)Voice mode (STT/TTS pipeline)enhancementNew feature or requestNew feature or requestpriority: mediumShould be done soonShould be done soonsize: m4-8 hours (half to full day)4-8 hours (half to full day)type: featureNew functionality or enhancementNew functionality or enhancement
Description
Summary
The voice pipeline supports two interaction modes: Push-to-Talk (PTT) and Voice Activity Detection (VAD). PTT works end-to-end, but VAD mode has no continuous capture loop — there is nothing driving the vad_process_frame() method with live microphone audio.
The pipeline correctly:
- Creates a
VoiceActivityDetectorwhen mode is VAD - Provides
vad_process_frame()which processes audio frames and emits transcripts - Loads the Silero VAD model
But there is no Tauri command or background task that:
- Continuously reads audio from the microphone
- Splits it into frames
- Feeds those frames to
vad_process_frame()
As a result, VAD mode silently does nothing when activated.
What needs to change
Option A: Background capture loop in the pipeline
When start() is called in VAD mode, spawn a background task that:
- Starts audio capture (
capture.start_recording()) - Reads accumulated samples periodically (e.g. every 30ms)
- Feeds frames to
vad_process_frame() - Stops when the pipeline is stopped
This keeps all logic inside the Rust pipeline.
Option B: Tauri-driven polling
Add a Tauri command (or timer-based event) that periodically:
- Reads audio from the capture buffer
- Calls
vad_process_frame()on the pipeline - Emits transcript events when speech is detected
Option A is cleaner but requires the stream thread work from #178.
Either way
- Add a
voice_vad_toggleor similar command if not already present - Update the frontend to use VAD mode (currently PTT-only in the overlay)
- Consider documenting VAD as experimental / not-yet-functional until this is implemented
Files
crates/gglib-voice/src/pipeline.rs—start(),vad_process_frame()src-tauri/src/commands/voice.rs— no VAD-driving command existssrc/components/VoiceOverlay/VoiceOverlay.tsx— PTT button shown but no VAD UI
Priority
Medium — PTT mode works, but the VAD mode option in settings is misleading without this.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component: voiceVoice mode (STT/TTS pipeline)Voice mode (STT/TTS pipeline)enhancementNew feature or requestNew feature or requestpriority: mediumShould be done soonShould be done soonsize: m4-8 hours (half to full day)4-8 hours (half to full day)type: featureNew functionality or enhancementNew functionality or enhancement