Full interactive Voice mode for Claude Code and Codex CLI on Apple Silicon. Talk to your AI, hear it talk back — all running locally on your Mac. Three voice input modes, Auto-Focus & easy setup. Open Source.
You use Claude Code or Codex CLI normally. After every response, the AI's answer is automatically spoken aloud through your Mac's speakers using a local TTS model. Three voice input modes: Press-to-Talk (press hotkey to start/stop), Hold-to-Talk (hold hotkey to record, release to transcribe), or Hands-Free (say "initiate" to start recording, 3s silence auto-transcribes, say "hold on" to interrupt TTS).
Everything runs on your Mac — no cloud APIs, no data leaves your machine.
Download OpenWhisperer-1.3.2.dmg — drag to Applications and launch.
On first launch, the app:
- Creates a Python environment with all dependencies
- Downloads MLX Whisper (~1.5GB) and Kokoro TTS (~300MB) models
- Starts the unified server automatically
The menubar icon gives you:
- Start/Stop/Restart server with configurable port
- Push-to-Talk — configurable hotkey (Ctrl, fn, Option, Cmd) to record
- Language selector — set STT language to avoid hallucinations (17 languages)
- Voice picker — choose from 11 Kokoro voices across 8 languages (no server restart needed)
- Voice detail — set VOICE tag verbosity: Brief (1 sentence), Natural (1-3), or Detailed (4-6)
- TTS Volume — Low, Medium (default), or High output volume
- Start on startup — optional login item to launch automatically when you log in
- Automation — Auto-Focus and Auto-Submit (requires Accessibility permission)
- Platform selector — switch between Claude Code and Codex CLI (auto-configures hooks and voice tags)
- Auto-Apply — one-click setup for hooks and voice tags (adapts to selected platform)
- Accessibility prompt — asks for permission on first launch with live granted/not-granted status
- Diagnostic checklist — shows hook, voice tag, and TTS status at a glance
- Transcription overlay — floating window showing live waveform and recent transcriptions
- Events log — diagnostic log for troubleshooting paste and transcription issues
- Unified server log (STT + TTS on single port, includes transcribed text)
After setup, use the menubar buttons for configuration instructions.
Three modes for speech-to-text, all using your local Whisper server. Transcribed text is typed directly into whatever app you have focused.
- Hold Ctrl — recording starts immediately
- Speak your message
- Release Ctrl — audio is transcribed and inserted
- Press Ctrl — recording starts (red indicator)
- Speak your message
- Press Ctrl again — audio is sent to Whisper for transcription
- Text is inserted via Accessibility (native apps) or CGEvent Unicode typing (all others) — clipboard is never touched
No button press needed. Uses on-device keyword detection (Apple Speech framework).
- say "initiate" — recording starts (cyan → red indicator)
- speak your message
- 3 seconds of silence — audio is auto-transcribed and inserted
- returns to listening for "initiate" again
- say "hold on" during TTS playback — interrupts audio and starts recording
Tip: "Hold on" barge-in works best with headphones — without them the mic may pick up the TTS audio instead of your voice.
- Microphone permission — macOS will prompt on first use
- Accessibility permission — required for typing text into other apps. Grant in System Settings → Privacy & Security → Accessibility
Note: After rebuilding from source, you must remove and re-add the app in Accessibility settings (macOS caches the code signature).
Both features are in the Automation section of the menubar and require Accessibility permission (macOS will prompt you on first use).
Enable Auto-Focus to automatically bring a specific app to the front when you finish speaking. Pick from 15 apps (VS Code, Cursor, Windsurf, Zed, Xcode, Sublime Text, Nova, Fleet, Claude, Terminal, iTerm2, Warp, Alacritty, Ghostty) or select CUSTOM to type any app name. Uses native NSRunningApplication.activate() — no System Events permission needed.
Enable Auto-Submit to automatically submit after every transcription — no trigger word needed. The transcribed text is typed and Enter is pressed.
Barge-in: Any currently playing TTS audio is automatically interrupted when you start recording (press Ctrl) or when Auto-Submit triggers, so you can speak without waiting for the AI to finish talking.
If you prefer not to grant Accessibility permission, press fn fn to use built-in macOS dictation. Less accurate for technical terms, but works instantly with zero setup.
Claude adds a [VOICE: ...] tag at the end of every response:
Here's the full code with detailed explanation...
[VOICE: I added the login endpoint. It validates the email and returns a JWT token.]
- Screen: You see the full detailed response
- Speakers: You hear only the short spoken summary
- The hook script extracts the tag, sends it to Kokoro TTS, and plays the audio
- If there's no
[VOICE:]tag, the hook falls back to stripping markdown and reading the raw text (truncated to ~600 chars)
Choose how verbose the spoken summary is (set in the menubar under Detail):
| Level | Sentences | Description |
|---|---|---|
| Brief | 1 | Just the key outcome |
| Natural | 1-3 | Conversational summary (default) |
| Detailed | 4-6 | Thorough explanation of what changed, why, and what to do next |
| Variable | Default | Used by | Description |
|---|---|---|---|
TTS_URL |
http://localhost:8000/v1/audio/speech |
tts-hook.sh | Unified server TTS endpoint |
TTS_VOICE |
af_heart |
tts-hook.sh | Kokoro voice name |
TTS_VOLUME |
1 |
tts-hook.sh | Playback volume (0.3=Low, 1=Medium, 4=High) |
TTS_MODEL |
prince-canuma/Kokoro-82M |
tts-hook.sh | TTS model |
SERVER_PORT |
8000 |
unified_server.py | Server port |
WHISPER_MODEL |
mlx-community/whisper-large-v3-turbo |
unified_server.py | Whisper model |
Tip: Setting a specific language (e.g. English) instead of auto-detect prevents Whisper from hallucinating text in other languages during silence or background noise.
No audio after response:
- Check TTS server is running:
curl http://localhost:8000/models - Test TTS directly:
echo "hello" | ./scripts/speak.sh - Check the hook path in
settings.jsonis correct and absolute
Push-to-talk not typing text:
- Check Accessibility permission is granted in System Settings
- If rebuilt from source, remove and re-add the app in Accessibility settings
- Check Events Log in the menubar for diagnostic details
422 error from TTS:
- Make sure
modelfield is included in requests - Run
./setup.shagain to reinstall spaCy model
Tip: You can ask your AI assistant (Claude, ChatGPT, etc.) to run these steps for you. Just paste the section below into your AI chat.
- Mac with Apple Silicon (M1/M2/M3/M4)
- Claude Code (CLI or VS Code extension)
- uv (
curl -LsSf https://astral.sh/uv/install.sh | sh) - jq — install with one of:
# Option A: Direct download (no package manager needed) curl -L -o /usr/local/bin/jq https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-macos-arm64 && chmod +x /usr/local/bin/jq # Option B: Homebrew (if you have it) brew install jq
git clone https://github.com/PerIPan/OpenWhisperer.git
cd OpenWhisperer
chmod +x setup.sh && ./setup.shThis creates a Python venv at ~/mlx-openai-whisper and installs everything (MLX Audio, Whisper, Kokoro TTS, spaCy).
./servers/start-servers.shOne unified server starts on:
localhost:8000— Whisper STT + Kokoro TTS (both on one port)
Keep this terminal open while using Claude.
Copy the CLAUDE.md file into any project where you want voice mode:
cp CLAUDE.md ~/my-project/This tells Claude to add a [VOICE: ...] tag to every response with a short spoken summary.
Add this to your ~/.claude/settings.json (or your project's .claude/settings.json):
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "/absolute/path/to/OpenWhisperer/hooks/tts-hook.sh",
"timeout": 60
}
]
}
]
}
}Replace /absolute/path/to/OpenWhisperer with where you cloned the repo (e.g. /Users/yourname/OpenWhisperer).
cd app
chmod +x build-dmg.sh
./build-dmg.shRequires Xcode Command Line Tools. Produces Open Whisperer.app and OpenWhisperer-1.3.2.dmg in app/.build/.
OpenWhisperer/
├── CLAUDE.md # Copy to your project (tells Claude to add VOICE tags)
├── setup.sh # One-click installer
├── hooks/
│ └── tts-hook.sh # Claude Code hook — speaks responses via TTS
├── servers/
│ ├── unified_server.py # Unified STT+TTS server (single port, auto-submit)
│ └── start-servers.sh # Launches the server
├── scripts/
│ └── speak.sh # Standalone TTS utility (pipe text to hear it)
└── app/ # macOS menubar app source (Swift)
├── Package.swift
├── Sources/
├── Resources/
└── build-dmg.sh # Build the .dmg yourself
Contributions are welcome! Feel free to open issues or submit pull requests. Whether it's bug fixes, new features, documentation improvements, or voice model suggestions — all contributions are appreciated.
- MLX Audio — TTS and STT on Apple Silicon
- Kokoro — TTS model
- Claude Code — Anthropic's CLI
- Codex CLI — OpenAI's CLI agent
MIT

