Open Whisperer

Full interactive Voice mode for Claude Code and Codex CLI on Apple Silicon. Talk to your AI, hear it talk back — all running locally on your Mac. Three voice input modes, Auto-Focus & easy setup. Open Source.

What It Does

You use Claude Code or Codex CLI normally. After every response, the AI's answer is automatically spoken aloud through your Mac's speakers using a local TTS model. Three voice input modes: Press-to-Talk (press hotkey to start/stop), Hold-to-Talk (hold hotkey to record, release to transcribe), or Hands-Free (say "initiate" to start recording, 3s silence auto-transcribes, say "hold on" to interrupt TTS).

Everything runs on your Mac — no cloud APIs, no data leaves your machine.

Install

Download OpenWhisperer-1.3.2.dmg — drag to Applications and launch.

On first launch, the app:

Creates a Python environment with all dependencies
Downloads MLX Whisper (~1.5GB) and Kokoro TTS (~300MB) models
Starts the unified server automatically

The menubar icon gives you:

Start/Stop/Restart server with configurable port
Push-to-Talk — configurable hotkey (Ctrl, fn, Option, Cmd) to record
Language selector — set STT language to avoid hallucinations (17 languages)
Voice picker — choose from 11 Kokoro voices across 8 languages (no server restart needed)
Voice detail — set VOICE tag verbosity: Brief (1 sentence), Natural (1-3), or Detailed (4-6)
TTS Volume — Low, Medium (default), or High output volume
Start on startup — optional login item to launch automatically when you log in
Automation — Auto-Focus and Auto-Submit (requires Accessibility permission)
Platform selector — switch between Claude Code and Codex CLI (auto-configures hooks and voice tags)
Auto-Apply — one-click setup for hooks and voice tags (adapts to selected platform)
Accessibility prompt — asks for permission on first launch with live granted/not-granted status
Diagnostic checklist — shows hook, voice tag, and TTS status at a glance
Transcription overlay — floating window showing live waveform and recent transcriptions
Events log — diagnostic log for troubleshooting paste and transcription issues
Unified server log (STT + TTS on single port, includes transcribed text)

After setup, use the menubar buttons for configuration instructions.

Voice Input Modes

Three modes for speech-to-text, all using your local Whisper server. Transcribed text is typed directly into whatever app you have focused.

Hold-to-Talk (default)

Hold Ctrl — recording starts immediately
Speak your message
Release Ctrl — audio is transcribed and inserted

Press-to-Talk

Press Ctrl — recording starts (red indicator)
Speak your message
Press Ctrl again — audio is sent to Whisper for transcription
Text is inserted via Accessibility (native apps) or CGEvent Unicode typing (all others) — clipboard is never touched

Hands-Free

No button press needed. Uses on-device keyword detection (Apple Speech framework).

say "initiate" — recording starts (cyan → red indicator)
speak your message
3 seconds of silence — audio is auto-transcribed and inserted
returns to listening for "initiate" again
say "hold on" during TTS playback — interrupts audio and starts recording

Tip: "Hold on" barge-in works best with headphones — without them the mic may pick up the TTS audio instead of your voice.

Requirements

Microphone permission — macOS will prompt on first use
Accessibility permission — required for typing text into other apps. Grant in System Settings → Privacy & Security → Accessibility

Note: After rebuilding from source, you must remove and re-add the app in Accessibility settings (macOS caches the code signature).

Automation

Both features are in the Automation section of the menubar and require Accessibility permission (macOS will prompt you on first use).

Auto-Focus

Enable Auto-Focus to automatically bring a specific app to the front when you finish speaking. Pick from 15 apps (VS Code, Cursor, Windsurf, Zed, Xcode, Sublime Text, Nova, Fleet, Claude, Terminal, iTerm2, Warp, Alacritty, Ghostty) or select CUSTOM to type any app name. Uses native NSRunningApplication.activate() — no System Events permission needed.

Auto-Submit

Enable Auto-Submit to automatically submit after every transcription — no trigger word needed. The transcribed text is typed and Enter is pressed.

Barge-in: Any currently playing TTS audio is automatically interrupted when you start recording (press Ctrl) or when Auto-Submit triggers, so you can speak without waiting for the AI to finish talking.

Fallback: macOS Dictation

If you prefer not to grant Accessibility permission, press fn fn to use built-in macOS dictation. Less accurate for technical terms, but works instantly with zero setup.

How the VOICE Tag Works

Claude adds a [VOICE: ...] tag at the end of every response:

Here's the full code with detailed explanation...

[VOICE: I added the login endpoint. It validates the email and returns a JWT token.]

Screen: You see the full detailed response
Speakers: You hear only the short spoken summary
The hook script extracts the tag, sends it to Kokoro TTS, and plays the audio
If there's no [VOICE:] tag, the hook falls back to stripping markdown and reading the raw text (truncated to ~600 chars)

Voice Detail Levels

Choose how verbose the spoken summary is (set in the menubar under Detail):

Level	Sentences	Description
Brief	1	Just the key outcome
Natural	1-3	Conversational summary (default)
Detailed	4-6	Thorough explanation of what changed, why, and what to do next

Configuration

Variable	Default	Used by	Description
`TTS_URL`	`http://localhost:8000/v1/audio/speech`	tts-hook.sh	Unified server TTS endpoint
`TTS_VOICE`	`af_heart`	tts-hook.sh	Kokoro voice name
`TTS_VOLUME`	`1`	tts-hook.sh	Playback volume (0.3=Low, 1=Medium, 4=High)
`TTS_MODEL`	`prince-canuma/Kokoro-82M`	tts-hook.sh	TTS model
`SERVER_PORT`	`8000`	unified_server.py	Server port
`WHISPER_MODEL`	`mlx-community/whisper-large-v3-turbo`	unified_server.py	Whisper model

Tip: Setting a specific language (e.g. English) instead of auto-detect prevents Whisper from hallucinating text in other languages during silence or background noise.

Troubleshooting

No audio after response:

Check TTS server is running: curl http://localhost:8000/models
Test TTS directly: echo "hello" | ./scripts/speak.sh
Check the hook path in settings.json is correct and absolute

Push-to-talk not typing text:

Check Accessibility permission is granted in System Settings
If rebuilt from source, remove and re-add the app in Accessibility settings
Check Events Log in the menubar for diagnostic details

422 error from TTS:

Make sure model field is included in requests
Run ./setup.sh again to reinstall spaCy model

Manual Setup (from source)

Tip: You can ask your AI assistant (Claude, ChatGPT, etc.) to run these steps for you. Just paste the section below into your AI chat.

Prerequisites

Mac with Apple Silicon (M1/M2/M3/M4)
Claude Code (CLI or VS Code extension)
uv (curl -LsSf https://astral.sh/uv/install.sh | sh)

jq — install with one of:

# Option A: Direct download (no package manager needed)
curl -L -o /usr/local/bin/jq https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-macos-arm64 && chmod +x /usr/local/bin/jq

# Option B: Homebrew (if you have it)
brew install jq

Step 1: Install

git clone https://github.com/PerIPan/OpenWhisperer.git
cd OpenWhisperer
chmod +x setup.sh && ./setup.sh

This creates a Python venv at ~/mlx-openai-whisper and installs everything (MLX Audio, Whisper, Kokoro TTS, spaCy).

Step 2: Start the servers

./servers/start-servers.sh

One unified server starts on:

localhost:8000 — Whisper STT + Kokoro TTS (both on one port)

Keep this terminal open while using Claude.

Step 3: Tell Claude to speak

Copy the CLAUDE.md file into any project where you want voice mode:

cp CLAUDE.md ~/my-project/

This tells Claude to add a [VOICE: ...] tag to every response with a short spoken summary.

Step 4: Add the TTS hook

Add this to your ~/.claude/settings.json (or your project's .claude/settings.json):

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/absolute/path/to/OpenWhisperer/hooks/tts-hook.sh",
            "timeout": 60
          }
        ]
      }
    ]
  }
}

Replace /absolute/path/to/OpenWhisperer with where you cloned the repo (e.g. /Users/yourname/OpenWhisperer).

Building the App from Source

cd app
chmod +x build-dmg.sh
./build-dmg.sh

Requires Xcode Command Line Tools. Produces Open Whisperer.app and OpenWhisperer-1.3.2.dmg in app/.build/.

File Structure

OpenWhisperer/
├── CLAUDE.md              # Copy to your project (tells Claude to add VOICE tags)
├── setup.sh               # One-click installer
├── hooks/
│   └── tts-hook.sh       # Claude Code hook — speaks responses via TTS
├── servers/
│   ├── unified_server.py # Unified STT+TTS server (single port, auto-submit)
│   └── start-servers.sh  # Launches the server
├── scripts/
│   └── speak.sh          # Standalone TTS utility (pipe text to hear it)
└── app/                   # macOS menubar app source (Swift)
    ├── Package.swift
    ├── Sources/
    ├── Resources/
    └── build-dmg.sh       # Build the .dmg yourself

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests. Whether it's bug fixes, new features, documentation improvements, or voice model suggestions — all contributions are appreciated.

Credits

MLX Audio — TTS and STT on Apple Silicon
Kokoro — TTS model
Claude Code — Anthropic's CLI
Codex CLI — OpenAI's CLI agent

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.vscode		.vscode
app		app
hooks		hooks
scripts		scripts
servers		servers
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
icon.png		icon.png
logo-appicon.svg		logo-appicon.svg
logo.svg		logo.svg
screenshot.png		screenshot.png
screenshot2.png		screenshot2.png
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Whisperer

What It Does

Install

Voice Input Modes

Hold-to-Talk (default)

Press-to-Talk

Hands-Free

Requirements

Automation

Auto-Focus

Auto-Submit

Fallback: macOS Dictation

How the VOICE Tag Works

Voice Detail Levels

Configuration

Troubleshooting

Manual Setup (from source)

Prerequisites

Step 1: Install

Step 2: Start the servers

Step 3: Tell Claude to speak

Step 4: Add the TTS hook

Building the App from Source

File Structure

Contributing

Credits

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open Whisperer

What It Does

Install

Voice Input Modes

Hold-to-Talk (default)

Press-to-Talk

Hands-Free

Requirements

Automation

Auto-Focus

Auto-Submit

Fallback: macOS Dictation

How the VOICE Tag Works

Voice Detail Levels

Configuration

Troubleshooting

Manual Setup (from source)

Prerequisites

Step 1: Install

Step 2: Start the servers

Step 3: Tell Claude to speak

Step 4: Add the TTS hook

Building the App from Source

File Structure

Contributing

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages