[Feat] Voice input via local whisper.cpp sidecar#41
Merged
Conversation
|
Hi @samzong, DetailsInstructions for interacting with me using comments are available here. |
Signed-off-by: samzong <samzong.lu@gmail.com>
ad472da to
8847ab5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add hold-Space-to-dictate voice input to the chat input, powered by a local whisper.cpp sidecar process. All transcription runs on-device — no cloud API involved. Includes first-use onboarding dialog, mic button, inline recording/transcribing overlay, and setup documentation.
Type of change
[Feat]new featureWhy is this needed?
Voice dictation significantly improves input speed for longer task descriptions. Running transcription locally via whisper.cpp keeps data private and avoids external service dependencies.
What changed?
voice-handlers.ts— IPC handlers for microphone permission, whisper.cpp binary/model detection (with module-level caching), and audio transcription viaexecFilegetMicrophonePermission,requestMicrophonePermission,checkWhisper,transcribeAudioAPIsuseVoiceInput— state machine managing hold-to-record, permission flow, intro dialog gating, and session lifecyclewhisper-stt.ts— MediaStream → AudioContext → ScriptProcessorNode PCM capture, WAV encoding, IPC transcriptionvoice-input-utils.ts— transcript-at-caret insertion, hotkey guard, press duration classificationVoiceSession,CreateVoiceSessionHandlers,VoicePermissionStatus,VoiceErrorCodeintolib/voice/types.tsui/dialog.tsx(Radix-based)NSMicrophoneUsageDescriptionin electron-builder configdocs/voice-input-setup.md— installation guide for whisper-cpp and model downloadbrowser-stt.ts(unreachable in Electron), fixedVoiceErrorCodetype collapse, cached filesystem probes, fixedreleaseSessionstop/destroy semanticsLinked issues
N/A
Validation
pnpm typecheckpnpm testpnpm buildCommands, screenshots, or notes:
Screenshots or recordings
N/A
Release note
NONE.Checklist