A lightweight background helper for Windows, macOS, and Linux that:
- Starts/stops mic capture with a single global hotkey (toggle mode) or hold-to-talk mode.
- NEW: Hold Win+Alt (or any custom combo) to record, release to transcribe!
- NEW: Hold mouse scroll wheel button to record, release to transcribe!
- NEW: Visual indicator shows when recording is active.
- NEW: Voice Activity Detection (VAD) prevents sending silent/empty recordings to the server!
- NEW: Microphone test on startup ensures your mic is working properly!
- Streams raw PCM16 audio to the Brainwave server (
wss://f.gpty.ai/api/v1/ws). - Copies the transcript to clipboard and (optionally) auto-pastes at the current cursor.
- Can run headless and auto-start at login.
Press the hotkey once to start recording, press again to stop. The transcript is copied/pasted when you stop.
Hold down Win+Alt (or your custom combo) to record. Release to stop and get the transcript immediately. Perfect for quick voice commands!
Hold down your mouse scroll wheel (or any button) to record. Release to transcribe. Great for gaming or when your hands are on the mouse!
- Windows 10/11
- Python 3.9+
- Microphone access permission
- macOS 12+
- Python 3.9+
- Microphone access permission for your terminal/app running the script
- Accessibility permission for auto-paste (if enabled)
- Python 3.9+
- ALSA/PulseAudio for audio
- X11 or Wayland for hotkeys
-
Install Python (if not already installed):
- Download from python.org
- During installation, check "Add Python to PATH"
-
Run the installation script:
scripts\install_windows.bat
This will:
- Create a virtual environment
- Install all required dependencies
-
First Run:
scripts\run_hotmic.bat
Or manually:
.venv\Scripts\activate.bat python hotmic.py
- Toggle mode:
Ctrl+U(press once to start, again to stop) - Hold-to-talk mode:
Win+Alt(hold to record, release to transcribe) - Mouse button: Middle mouse button / scroll wheel (hold to record, release to transcribe)
When recording stops, the transcript is automatically copied to the clipboard and—if enabled—auto-pasted with Ctrl+V.
A red recording indicator will appear at the top of your screen when recording is active (if show_visual_indicator is enabled).
To have HotMic start automatically when you log in:
-
Complete the Setup and First Run (to ensure everything works)
-
Run the auto-start installation (requires PowerShell):
powershell -ExecutionPolicy Bypass -File scripts\install_autostart.ps1
-
Grant Permissions (if prompted):
- Allow microphone access when Windows prompts you
- If auto-paste doesn't work, check Windows accessibility settings
-
View Logs:
- Output:
%LOCALAPPDATA%\HotMic\Logs\hotmic.log - Errors:
%LOCALAPPDATA%\HotMic\Logs\hotmic_error.log
- Output:
powershell -ExecutionPolicy Bypass -File scripts\uninstall_autostart.ps1cd path/to/your/cloned/repo
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython hotmic.py- Toggle mode:
Cmd+U(press once to start, again to stop) - Hold-to-talk mode:
Cmd+Alt(hold to record, release to transcribe) - Mouse button: Middle mouse button / scroll wheel (hold to record, release to transcribe)
When recording stops, the transcript is automatically copied to the clipboard and—if enabled—auto-pasted with Cmd+V.
A red recording indicator will appear at the top of your screen when recording is active (if show_visual_indicator is enabled).
- Microphone: Run
hotmic.pyonce; macOS prompts for mic access. - Accessibility (for autopaste): System Settings → Privacy & Security → Accessibility → allow Terminal/iTerm/Your app.
Use the LaunchAgent scripts:
Install:
scripts/install_launchagent.shUninstall:
scripts/uninstall_launchagent.shLogs:
~/Library/Logs/com.hotmic.out.log~/Library/Logs/com.hotmic.err.log
Permissions when autostarting:
- System Settings → Privacy & Security:
- Accessibility: find the existing "python" entry and toggle it ON (needed for auto‑paste)
- Input Monitoring: find the existing "python" entry and toggle it ON (needed for hotkeys)
Edit config.json to customize settings:
{
"endpoint": "wss://f.gpty.ai/api/v1/ws",
"hotkey": "<ctrl>+u", // Toggle mode hotkey
"autopaste": true,
"samplerate": 48000,
"channels": 1,
"block_samples": 24000,
"input_device": null,
"connect_timeout": 8.0,
"stop_flush_wait": 0.5,
"enable_hold_mode": true, // Enable hold-to-talk mode
"hold_hotkey": "<cmd>+<alt>", // Hold this combo to record (Windows key + Alt)
"enable_mouse_button": true, // Enable mouse button recording
"mouse_button": "middle", // Mouse button to use (left/middle/right)
"show_visual_indicator": true // Show visual indicator when recording
}- endpoint: WebSocket server URL for speech-to-text service
- hotkey: Global hotkey to toggle recording (press once to start, again to stop)
- Windows/Linux: Use
<ctrl>,<alt>,<shift>,<cmd>(Windows key) - macOS: Use
<cmd>,<alt>,<ctrl> - Examples:
"<ctrl>+u","<alt>+<shift>+r"
- Windows/Linux: Use
- autopaste: Automatically paste the transcript after recording (true/false)
- samplerate: Audio sample rate (48000 recommended)
- channels: Number of audio channels (1 for mono)
- block_samples: Samples per audio block (affects latency)
- input_device: Audio input device ID (null for default)
- connect_timeout: WebSocket connection timeout in seconds
- stop_flush_wait: Time to wait for final audio chunks after stopping
- enable_hold_mode: Enable hold-to-talk mode (true/false)
- When enabled, hold down the specified key combo to record, release to transcribe
- hold_hotkey: Key combination for hold-to-talk mode
- Windows example:
"<cmd>+<alt>"(Windows key + Alt) - macOS example:
"<cmd>+<alt>"(Command + Option) - On Windows,
<cmd>refers to the Windows key - You can use any combination of modifier keys
- Windows example:
- enable_mouse_button: Enable mouse button recording (true/false)
- When enabled, hold down the specified mouse button to record, release to transcribe
- mouse_button: Which mouse button to use
- Options:
"left","middle","right" "middle"is the scroll wheel button (recommended to avoid conflicts)
- Options:
- show_visual_indicator: Show a visual indicator window when recording (true/false)
- When enabled, a red "🎤 Recording..." banner appears at the top of your screen
- The banner automatically disappears when you stop recording
- enable_vad: Enable/disable voice activity detection (true/false)
- When enabled, the system detects if actual speech was present in the recording
- Prevents sending silent or empty recordings to the server
- Saves bandwidth and reduces false transcriptions
- vad_energy_threshold: Audio energy threshold for speech detection (default: 500.0)
- Higher values = less sensitive (requires louder audio to be considered speech)
- Lower values = more sensitive (may pick up background noise)
- Adjust based on your microphone sensitivity and environment
- vad_min_speech_duration: Minimum speech duration in seconds (default: 0.3)
- Minimum amount of speech required to send to server
- Prevents very short clicks or noise from being sent
- silence_timeout: Reserved for future automatic silence detection (default: 2.0)
- test_mic_on_startup: Test microphone on startup (true/false)
- When enabled, records 2 seconds of audio on startup to verify mic is working
- Displays audio energy levels and warns if microphone appears to be muted
- Set to false to skip the test and start faster
- Default hotkey uses
Ctrlinstead ofCmd - Auto-paste uses
Ctrl+V - Auto-start uses Windows Task Scheduler
- Clipboard handled by
pyperclip
- Default hotkey uses
Cmd - Auto-paste uses
Cmd+V - Auto-start uses LaunchAgent
- Requires Accessibility and Input Monitoring permissions
- Default hotkey uses
Ctrl - Auto-paste uses
Ctrl+V - May require additional setup for hotkeys on Wayland
- Audio requires ALSA or PulseAudio
- Hotkey not working: Run as administrator or check if another program is using the hotkey
- Hold-to-talk not working: Make sure
enable_hold_modeis true and the key combination is not used by other programs - Mouse button not working: Ensure
enable_mouse_buttonis true and try running as administrator - Visual indicator not showing: Check if
show_visual_indicatoris true in config.json - Auto-paste not working: Grant microphone and accessibility permissions
- Audio issues: Check Windows audio settings and ensure microphone is not muted
- Hotkey not working: Grant Input Monitoring permission in System Settings
- Hold-to-talk not working: Grant Accessibility and Input Monitoring permissions
- Mouse button not working: Grant Accessibility permission
- Auto-paste not working: Grant Accessibility permission in System Settings
- Log shows "not trusted" error: Add python to Accessibility and Input Monitoring
- No transcript received: Check your internet connection and endpoint URL
- Choppy audio: Increase
block_samplesin config.json - Connection timeout: Increase
connect_timeoutin config.json - Microphone test fails: Check microphone permissions and ensure mic is not muted
- Too many silent recordings skipped: Lower
vad_energy_thresholdin config.json - Background noise triggers recording: Increase
vad_energy_thresholdin config.json - VAD not working: Ensure
enable_vadis true and numpy is installed (pip install numpy)
The code is designed to be cross-platform:
- Clipboard operations use
pyperclip(cross-platform) - Paste hotkey automatically detects OS (Cmd on macOS, Ctrl elsewhere)
- Audio capture uses
sounddevice(cross-platform) - Global hotkeys use
pynput(cross-platform) - Audio processing uses
numpyfor energy calculation and voice activity detection
The VAD system uses RMS (Root Mean Square) energy calculation to detect speech:
- Calculates audio energy for each chunk in real-time
- Compares against configurable threshold (
vad_energy_threshold) - Tracks speech duration to filter out brief noise
- Prevents sending silent recordings to save bandwidth
On startup, the system can test the microphone by:
- Recording 2 seconds of audio
- Calculating average energy levels
- Warning if levels are too low (mic muted or not working)
- Providing feedback to help troubleshoot issues
If you prefer to self-host the speech-to-text service:
- Run the server from github.com/grapeot/brainwave
- Update the
endpointinconfig.jsonto your server URL
- Uses the public Brainwave service at
f.gpty.ai - Audio streaming via
sounddevice - Global hotkeys via
pynput - Clipboard handling via
pyperclip