Skip to content

[Feat] Voice input via local whisper.cpp sidecar#41

Merged
samzong merged 1 commit intomainfrom
feat/voice-input-whisper
Mar 15, 2026
Merged

[Feat] Voice input via local whisper.cpp sidecar#41
samzong merged 1 commit intomainfrom
feat/voice-input-whisper

Conversation

@samzong
Copy link
Collaborator

@samzong samzong commented Mar 15, 2026

Summary

Add hold-Space-to-dictate voice input to the chat input, powered by a local whisper.cpp sidecar process. All transcription runs on-device — no cloud API involved. Includes first-use onboarding dialog, mic button, inline recording/transcribing overlay, and setup documentation.

Type of change

  • [Feat] new feature

Why is this needed?

Voice dictation significantly improves input speed for longer task descriptions. Running transcription locally via whisper.cpp keeps data private and avoids external service dependencies.

What changed?

  • Main process: voice-handlers.ts — IPC handlers for microphone permission, whisper.cpp binary/model detection (with module-level caching), and audio transcription via execFile
  • Preload bridge: Exposed getMicrophonePermission, requestMicrophonePermission, checkWhisper, transcribeAudio APIs
  • Voice hook: useVoiceInput — state machine managing hold-to-record, permission flow, intro dialog gating, and session lifecycle
  • Audio capture: whisper-stt.ts — MediaStream → AudioContext → ScriptProcessorNode PCM capture, WAV encoding, IPC transcription
  • Pure utils: voice-input-utils.ts — transcript-at-caret insertion, hotkey guard, press duration classification
  • Voice types: Extracted VoiceSession, CreateVoiceSessionHandlers, VoicePermissionStatus, VoiceErrorCode into lib/voice/types.ts
  • ChatInput UI: Mic button with tooltip, inline recording (pulsing dot) and transcribing (spinner) overlays, error toasts, auto-refocus after transcription
  • VoiceIntroDialog: First-use onboarding explaining hold-Space mechanics
  • shadcn Dialog: Added ui/dialog.tsx (Radix-based)
  • i18n: en.json + zh.json voice input strings
  • macOS: NSMicrophoneUsageDescription in electron-builder config
  • Docs: docs/voice-input-setup.md — installation guide for whisper-cpp and model download
  • Tests: 15 tests covering voice-input-utils (8) and useVoiceInput hook (7)
  • Cleanup: Removed dead browser-stt.ts (unreachable in Electron), fixed VoiceErrorCode type collapse, cached filesystem probes, fixed releaseSession stop/destroy semantics

Linked issues

N/A

Validation

  • pnpm typecheck
  • pnpm test
  • pnpm build
  • Manual smoke test
  • Not run

Commands, screenshots, or notes:

pnpm typecheck  ✓
pnpm test       ✓ 38 tests passed (4 suites)
Manual test     ✓ hold-Space recording, release-to-transcribe, mic button toggle, continuous dictation, intro dialog

Screenshots or recordings

N/A

Release note

  • No user-facing change. Release note is NONE.
  • User-facing change. Release note is included below.
Voice input (Beta): Hold Space in the chat input to dictate via local whisper.cpp. Requires `brew install whisper-cpp` and a downloaded model. See docs/voice-input-setup.md.

Checklist

  • The PR title uses at least one approved prefix
  • The summary explains both what changed and why
  • Validation reflects the commands actually run for this PR
  • The release note block is accurate

@github-actions
Copy link

Hi @samzong,
Thanks for your pull request!
If the PR is ready, use the /auto-cc command to assign Reviewer to Review.
We will review it shortly.

Details

Instructions for interacting with me using comments are available here.
If you have questions or suggestions related to my behavior, please file an issue against the gh-ci-bot repository.

Signed-off-by: samzong <samzong.lu@gmail.com>
@samzong samzong force-pushed the feat/voice-input-whisper branch from ad472da to 8847ab5 Compare March 15, 2026 05:17
@samzong samzong merged commit 0076609 into main Mar 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant