feat: always-on wake word capture with unified audio pipeline#302
Open
kocendavid wants to merge 2 commits intoOpenWhispr:mainfrom
Open
feat: always-on wake word capture with unified audio pipeline#302kocendavid wants to merge 2 commits intoOpenWhispr:mainfrom
kocendavid wants to merge 2 commits intoOpenWhispr:mainfrom
Conversation
Add wake word detection that runs continuously regardless of dictation state. When idle it listens for the wake phrase to start dictation; during dictation it listens for the finish phrase to stop. This replaces the previous design where wake word capture stopped during dictation and audioManager used timeslice chunks (which produced broken WebM files without headers). Key changes: - wakeWordManager: new main-process module that runs a separate WhisperServer (base model) for lightweight phrase detection - useWakeWordCapture: renderer hook using stop-start MediaRecorder cycles to produce complete WebM files every 3 seconds - audioManager: removed finish-phrase timeslice code — the always-on capture hook now handles stop word detection - Settings UI for enabling wake word, setting wake/finish phrases - backgroundThrottling disabled so capture works when window is hidden Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
Hey I think this is cool. Try and look if you like it, I will add also differnet words to finish to send or to cancel translation. Maybe some of the stuff is included in the inteligence tab |
Add two new configurable finish phrase actions alongside the existing finish phrase: - Cancel phrase: stops dictation and discards audio (no paste) - Enter phrase: stops dictation, pastes text, then simulates Enter key Also fixes false positive wake word matches by using whole-word matching instead of substring includes, and suppresses silence entries from the listener output log. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
audioManagerused timeslice chunks (which produced broken WebM files without headers)How it works
useWakeWordCapturesends 3-sec audio chunks to main process →wakeWordManagertranscribes with base Whisper model → fuzzy-matches against wake phrase → triggers dictationwakeWordRecordingStateIPC)Architecture
Key changes
New files
src/helpers/wakeWordManager.js— Main process module: manages a dedicated WhisperServer instance, handles phrase matching with fuzzy matching, auto-downloads the base modelsrc/hooks/useWakeWordCapture.js— Renderer hook: always-on mic capture using stop-start MediaRecorder cycles, polls wake word status, sends chunks via IPCModified files
src/App.jsx— CallsuseWakeWordCapture()(no args, runs continuously), notifies main process of recording statesrc/helpers/audioManager.js— Removed finish-phrase timeslice code (~20 lines) — the always-on hook handles stop-word detection nowmain.js— InitializesWakeWordManager, registers IPC handlers, auto-starts if enabledpreload.js— Exposes wake word IPC API (toggle, set phrases, check chunks, status)src/helpers/ipcHandlers.js— Routes wake word IPC calls to managersrc/helpers/environment.js— PersistsWAKE_WORD_ENABLED,WAKE_WORD_PHRASE,WAKE_WORD_FINISH_PHRASEsrc/helpers/windowConfig.js— DisablesbackgroundThrottlingso capture works when window is hiddensrc/hooks/useAudioRecording.js— Strips finish phrase from transcribed textsrc/components/SettingsPage.tsx— Wake word settings UI (enable/disable, set phrases, live detection log)src/components/SettingsModal.tsx— Adds Wake Word nav itemWhat this fixes
The previous approach had a WebM header bug: during dictation,
audioManagerusedMediaRecorder.start(3000)timeslice to send chunks for stop-word detection. Timeslice chunks after the first lack WebM headers, so FFmpeg couldn't parse them. The new design avoids this entirely —useWakeWordCaptureuses a stop-start cycle that always produces complete WebM files.Test plan
[FINISH]mode (not[WAKE])