Skip to content

feat(voice): Implement continuous VAD capture loop for hands-free mode #181

@mmogr

Description

@mmogr

Summary

The voice pipeline supports two interaction modes: Push-to-Talk (PTT) and Voice Activity Detection (VAD). PTT works end-to-end, but VAD mode has no continuous capture loop — there is nothing driving the vad_process_frame() method with live microphone audio.

The pipeline correctly:

  • Creates a VoiceActivityDetector when mode is VAD
  • Provides vad_process_frame() which processes audio frames and emits transcripts
  • Loads the Silero VAD model

But there is no Tauri command or background task that:

  1. Continuously reads audio from the microphone
  2. Splits it into frames
  3. Feeds those frames to vad_process_frame()

As a result, VAD mode silently does nothing when activated.

What needs to change

Option A: Background capture loop in the pipeline

When start() is called in VAD mode, spawn a background task that:

  1. Starts audio capture (capture.start_recording())
  2. Reads accumulated samples periodically (e.g. every 30ms)
  3. Feeds frames to vad_process_frame()
  4. Stops when the pipeline is stopped

This keeps all logic inside the Rust pipeline.

Option B: Tauri-driven polling

Add a Tauri command (or timer-based event) that periodically:

  1. Reads audio from the capture buffer
  2. Calls vad_process_frame() on the pipeline
  3. Emits transcript events when speech is detected

Option A is cleaner but requires the stream thread work from #178.

Either way

  • Add a voice_vad_toggle or similar command if not already present
  • Update the frontend to use VAD mode (currently PTT-only in the overlay)
  • Consider documenting VAD as experimental / not-yet-functional until this is implemented

Files

  • crates/gglib-voice/src/pipeline.rsstart(), vad_process_frame()
  • src-tauri/src/commands/voice.rs — no VAD-driving command exists
  • src/components/VoiceOverlay/VoiceOverlay.tsx — PTT button shown but no VAD UI

Priority

Medium — PTT mode works, but the VAD mode option in settings is misleading without this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    component: voiceVoice mode (STT/TTS pipeline)enhancementNew feature or requestpriority: mediumShould be done soonsize: m4-8 hours (half to full day)type: featureNew functionality or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions