Skip to content

feat: always-on wake word capture with unified audio pipeline#302

Open
kocendavid wants to merge 2 commits intoOpenWhispr:mainfrom
kocendavid:feat/always-on-wake-word
Open

feat: always-on wake word capture with unified audio pipeline#302
kocendavid wants to merge 2 commits intoOpenWhispr:mainfrom
kocendavid:feat/always-on-wake-word

Conversation

@kocendavid
Copy link

@kocendavid kocendavid commented Feb 22, 2026

Summary

  • Adds wake word detection that runs continuously regardless of dictation state — when idle it listens for the wake phrase to start dictation; during dictation it listens for the finish phrase to stop
  • Replaces the previous design where wake word capture stopped during dictation and audioManager used timeslice chunks (which produced broken WebM files without headers)
  • Uses a stop-start MediaRecorder cycle (3-second chunks) to produce complete, self-contained WebM files that FFmpeg can reliably parse

How it works

State Behavior
Idle useWakeWordCapture sends 3-sec audio chunks to main process → wakeWordManager transcribes with base Whisper model → fuzzy-matches against wake phrase → triggers dictation
Dictation starts Wake word capture continues running (switches to stop-word mode via wakeWordRecordingState IPC)
During dictation Chunks checked against finish phrase instead of wake phrase
Dictation stops Seamlessly switches back to wake phrase detection (no restart needed)

Architecture

[Renderer]                          [Main Process]
useWakeWordCapture                  wakeWordManager
  │ 3-sec MediaRecorder chunks        │ separate WhisperServer (base model)
  │ (stop-start cycle = valid WebM)   │ fuzzy phrase matching (Levenshtein)
  └──► wakeWordCheckChunk IPC ──────► └──► transcribe → match → trigger

Key changes

New files

  • src/helpers/wakeWordManager.js — Main process module: manages a dedicated WhisperServer instance, handles phrase matching with fuzzy matching, auto-downloads the base model
  • src/hooks/useWakeWordCapture.js — Renderer hook: always-on mic capture using stop-start MediaRecorder cycles, polls wake word status, sends chunks via IPC

Modified files

  • src/App.jsx — Calls useWakeWordCapture() (no args, runs continuously), notifies main process of recording state
  • src/helpers/audioManager.js — Removed finish-phrase timeslice code (~20 lines) — the always-on hook handles stop-word detection now
  • main.js — Initializes WakeWordManager, registers IPC handlers, auto-starts if enabled
  • preload.js — Exposes wake word IPC API (toggle, set phrases, check chunks, status)
  • src/helpers/ipcHandlers.js — Routes wake word IPC calls to manager
  • src/helpers/environment.js — Persists WAKE_WORD_ENABLED, WAKE_WORD_PHRASE, WAKE_WORD_FINISH_PHRASE
  • src/helpers/windowConfig.js — Disables backgroundThrottling so capture works when window is hidden
  • src/hooks/useAudioRecording.js — Strips finish phrase from transcribed text
  • src/components/SettingsPage.tsx — Wake word settings UI (enable/disable, set phrases, live detection log)
  • src/components/SettingsModal.tsx — Adds Wake Word nav item

What this fixes

The previous approach had a WebM header bug: during dictation, audioManager used MediaRecorder.start(3000) timeslice to send chunks for stop-word detection. Timeslice chunks after the first lack WebM headers, so FFmpeg couldn't parse them. The new design avoids this entirely — useWakeWordCapture uses a stop-start cycle that always produces complete WebM files.

Test plan

  • Enable wake word in settings, set wake phrase to "whisper"
  • Say "whisper" → recording starts, wake word capture keeps running (no gap in Listener Output)
  • Set finish phrase to "done" → say "done" during dictation → dictation stops, text pastes
  • During dictation, Listener Output shows [FINISH] mode (not [WAKE])
  • Hotkey still works independently for start/stop
  • Main transcription produces correct text (single complete blob, no timeslice artifacts)
  • Disable wake word → capture stops, no microphone indicator

Add wake word detection that runs continuously regardless of dictation
state. When idle it listens for the wake phrase to start dictation;
during dictation it listens for the finish phrase to stop. This replaces
the previous design where wake word capture stopped during dictation and
audioManager used timeslice chunks (which produced broken WebM files
without headers).

Key changes:
- wakeWordManager: new main-process module that runs a separate
  WhisperServer (base model) for lightweight phrase detection
- useWakeWordCapture: renderer hook using stop-start MediaRecorder
  cycles to produce complete WebM files every 3 seconds
- audioManager: removed finish-phrase timeslice code — the always-on
  capture hook now handles stop word detection
- Settings UI for enabling wake word, setting wake/finish phrases
- backgroundThrottling disabled so capture works when window is hidden

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kocendavid
Copy link
Author

Hey I think this is cool. Try and look if you like it, I will add also differnet words to finish to send or to cancel translation. Maybe some of the stuff is included in the inteligence tab

Add two new configurable finish phrase actions alongside the existing
finish phrase:
- Cancel phrase: stops dictation and discards audio (no paste)
- Enter phrase: stops dictation, pastes text, then simulates Enter key

Also fixes false positive wake word matches by using whole-word matching
instead of substring includes, and suppresses silence entries from the
listener output log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gabrielste1n gabrielste1n self-requested a review February 23, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant