HotMic: Cross-Platform Global Hotkey Push-to-Talk for Speech-to-Text

A lightweight background helper for Windows, macOS, and Linux that:

Starts/stops mic capture with a single global hotkey (toggle mode) or hold-to-talk mode.
NEW: Hold Win+Alt (or any custom combo) to record, release to transcribe!
NEW: Hold mouse scroll wheel button to record, release to transcribe!
NEW: Visual indicator shows when recording is active.
NEW: Voice Activity Detection (VAD) prevents sending silent/empty recordings to the server!
NEW: Microphone test on startup ensures your mic is working properly!
Streams raw PCM16 audio to the Brainwave server (wss://f.gpty.ai/api/v1/ws).
Copies the transcript to clipboard and (optionally) auto-pastes at the current cursor.
Can run headless and auto-start at login.

Recording Modes

Toggle Mode (Original)

Press the hotkey once to start recording, press again to stop. The transcript is copied/pasted when you stop.

Hold-to-Talk Mode (NEW!)

Hold down Win+Alt (or your custom combo) to record. Release to stop and get the transcript immediately. Perfect for quick voice commands!

Mouse Button Mode (NEW!)

Hold down your mouse scroll wheel (or any button) to record. Release to transcribe. Great for gaming or when your hands are on the mouse!

Requirements

Windows

Windows 10/11
Python 3.9+
Microphone access permission

macOS

macOS 12+
Python 3.9+
Microphone access permission for your terminal/app running the script
Accessibility permission for auto-paste (if enabled)

Linux

Python 3.9+
ALSA/PulseAudio for audio
X11 or Wayland for hotkeys

Setup (Windows)

Quick Start

Install Python (if not already installed):
- Download from python.org
- During installation, check "Add Python to PATH"
Run the installation script:
```
scripts\install_windows.bat
```
This will:
- Create a virtual environment
- Install all required dependencies

First Run:

scripts\run_hotmic.bat

Or manually:

.venv\Scripts\activate.bat
python hotmic.py

Default Hotkeys (Windows)

Toggle mode: Ctrl+U (press once to start, again to stop)
Hold-to-talk mode: Win+Alt (hold to record, release to transcribe)
Mouse button: Middle mouse button / scroll wheel (hold to record, release to transcribe)

When recording stops, the transcript is automatically copied to the clipboard and—if enabled—auto-pasted with Ctrl+V.

A red recording indicator will appear at the top of your screen when recording is active (if show_visual_indicator is enabled).

Auto-Start at Login (Windows)

To have HotMic start automatically when you log in:

Complete the Setup and First Run (to ensure everything works)

Run the auto-start installation (requires PowerShell):

powershell -ExecutionPolicy Bypass -File scripts\install_autostart.ps1

Grant Permissions (if prompted):
- Allow microphone access when Windows prompts you
- If auto-paste doesn't work, check Windows accessibility settings
View Logs:
- Output: %LOCALAPPDATA%\HotMic\Logs\hotmic.log
- Errors: %LOCALAPPDATA%\HotMic\Logs\hotmic_error.log

Uninstall Auto-Start (Windows)

powershell -ExecutionPolicy Bypass -File scripts\uninstall_autostart.ps1

Setup (macOS)

Quick Start

cd path/to/your/cloned/repo
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run

python hotmic.py

Default Hotkeys (macOS)

Toggle mode: Cmd+U (press once to start, again to stop)
Hold-to-talk mode: Cmd+Alt (hold to record, release to transcribe)
Mouse button: Middle mouse button / scroll wheel (hold to record, release to transcribe)

When recording stops, the transcript is automatically copied to the clipboard and—if enabled—auto-pasted with Cmd+V.

A red recording indicator will appear at the top of your screen when recording is active (if show_visual_indicator is enabled).

Permissions (macOS)

Microphone: Run hotmic.py once; macOS prompts for mic access.
Accessibility (for autopaste): System Settings → Privacy & Security → Accessibility → allow Terminal/iTerm/Your app.

Auto-Start at Login (macOS)

Use the LaunchAgent scripts:

Install:

scripts/install_launchagent.sh

Uninstall:

scripts/uninstall_launchagent.sh

Logs:

~/Library/Logs/com.hotmic.out.log
~/Library/Logs/com.hotmic.err.log

Permissions when autostarting:

System Settings → Privacy & Security:
- Accessibility: find the existing "python" entry and toggle it ON (needed for auto‑paste)
- Input Monitoring: find the existing "python" entry and toggle it ON (needed for hotkeys)

Configuration

Edit config.json to customize settings:

{
  "endpoint": "wss://f.gpty.ai/api/v1/ws",
  "hotkey": "<ctrl>+u",         // Toggle mode hotkey
  "autopaste": true,
  "samplerate": 48000,
  "channels": 1,
  "block_samples": 24000,
  "input_device": null,
  "connect_timeout": 8.0,
  "stop_flush_wait": 0.5,
  "enable_hold_mode": true,     // Enable hold-to-talk mode
  "hold_hotkey": "<cmd>+<alt>", // Hold this combo to record (Windows key + Alt)
  "enable_mouse_button": true,  // Enable mouse button recording
  "mouse_button": "middle",     // Mouse button to use (left/middle/right)
  "show_visual_indicator": true // Show visual indicator when recording
}

Configuration Options

Basic Options

endpoint: WebSocket server URL for speech-to-text service
hotkey: Global hotkey to toggle recording (press once to start, again to stop)
- Windows/Linux: Use <ctrl>, <alt>, <shift>, <cmd> (Windows key)
- macOS: Use <cmd>, <alt>, <ctrl>
- Examples: "<ctrl>+u", "<alt>+<shift>+r"
autopaste: Automatically paste the transcript after recording (true/false)
samplerate: Audio sample rate (48000 recommended)
channels: Number of audio channels (1 for mono)
block_samples: Samples per audio block (affects latency)
input_device: Audio input device ID (null for default)
connect_timeout: WebSocket connection timeout in seconds
stop_flush_wait: Time to wait for final audio chunks after stopping

Hold-to-Talk Mode (NEW!)

enable_hold_mode: Enable hold-to-talk mode (true/false)
- When enabled, hold down the specified key combo to record, release to transcribe
hold_hotkey: Key combination for hold-to-talk mode
- Windows example: "<cmd>+<alt>" (Windows key + Alt)
- macOS example: "<cmd>+<alt>" (Command + Option)
- On Windows, <cmd> refers to the Windows key
- You can use any combination of modifier keys

Mouse Button Recording (NEW!)

enable_mouse_button: Enable mouse button recording (true/false)
- When enabled, hold down the specified mouse button to record, release to transcribe
mouse_button: Which mouse button to use
- Options: "left", "middle", "right"
- "middle" is the scroll wheel button (recommended to avoid conflicts)

Visual Feedback (NEW!)

show_visual_indicator: Show a visual indicator window when recording (true/false)
- When enabled, a red "🎤 Recording..." banner appears at the top of your screen
- The banner automatically disappears when you stop recording

Voice Activity Detection (NEW!)

enable_vad: Enable/disable voice activity detection (true/false)
- When enabled, the system detects if actual speech was present in the recording
- Prevents sending silent or empty recordings to the server
- Saves bandwidth and reduces false transcriptions
vad_energy_threshold: Audio energy threshold for speech detection (default: 500.0)
- Higher values = less sensitive (requires louder audio to be considered speech)
- Lower values = more sensitive (may pick up background noise)
- Adjust based on your microphone sensitivity and environment
vad_min_speech_duration: Minimum speech duration in seconds (default: 0.3)
- Minimum amount of speech required to send to server
- Prevents very short clicks or noise from being sent
silence_timeout: Reserved for future automatic silence detection (default: 2.0)

Microphone Testing (NEW!)

test_mic_on_startup: Test microphone on startup (true/false)
- When enabled, records 2 seconds of audio on startup to verify mic is working
- Displays audio energy levels and warns if microphone appears to be muted
- Set to false to skip the test and start faster

Platform-Specific Notes

Windows

Default hotkey uses Ctrl instead of Cmd
Auto-paste uses Ctrl+V
Auto-start uses Windows Task Scheduler
Clipboard handled by pyperclip

macOS

Default hotkey uses Cmd
Auto-paste uses Cmd+V
Auto-start uses LaunchAgent
Requires Accessibility and Input Monitoring permissions

Linux

Default hotkey uses Ctrl
Auto-paste uses Ctrl+V
May require additional setup for hotkeys on Wayland
Audio requires ALSA or PulseAudio

Troubleshooting

Windows

Hotkey not working: Run as administrator or check if another program is using the hotkey
Hold-to-talk not working: Make sure enable_hold_mode is true and the key combination is not used by other programs
Mouse button not working: Ensure enable_mouse_button is true and try running as administrator
Visual indicator not showing: Check if show_visual_indicator is true in config.json
Auto-paste not working: Grant microphone and accessibility permissions
Audio issues: Check Windows audio settings and ensure microphone is not muted

macOS

Hotkey not working: Grant Input Monitoring permission in System Settings
Hold-to-talk not working: Grant Accessibility and Input Monitoring permissions
Mouse button not working: Grant Accessibility permission
Auto-paste not working: Grant Accessibility permission in System Settings
Log shows "not trusted" error: Add python to Accessibility and Input Monitoring

All Platforms

No transcript received: Check your internet connection and endpoint URL
Choppy audio: Increase block_samples in config.json
Connection timeout: Increase connect_timeout in config.json
Microphone test fails: Check microphone permissions and ensure mic is not muted
Too many silent recordings skipped: Lower vad_energy_threshold in config.json
Background noise triggers recording: Increase vad_energy_threshold in config.json
VAD not working: Ensure enable_vad is true and numpy is installed (pip install numpy)

Development

The code is designed to be cross-platform:

Clipboard operations use pyperclip (cross-platform)
Paste hotkey automatically detects OS (Cmd on macOS, Ctrl elsewhere)
Audio capture uses sounddevice (cross-platform)
Global hotkeys use pynput (cross-platform)
Audio processing uses numpy for energy calculation and voice activity detection

New Features

Voice Activity Detection (VAD)

The VAD system uses RMS (Root Mean Square) energy calculation to detect speech:

Calculates audio energy for each chunk in real-time
Compares against configurable threshold (vad_energy_threshold)
Tracks speech duration to filter out brief noise
Prevents sending silent recordings to save bandwidth

Microphone Testing

On startup, the system can test the microphone by:

Recording 2 seconds of audio
Calculating average energy levels
Warning if levels are too low (mic muted or not working)
Providing feedback to help troubleshoot issues

Self-Hosting

If you prefer to self-host the speech-to-text service:

Run the server from github.com/grapeot/brainwave
Update the endpoint in config.json to your server URL

Credits

Uses the public Brainwave service at f.gpty.ai
Audio streaming via sounddevice
Global hotkeys via pynput
Clipboard handling via pyperclip

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
launch		launch
scripts		scripts
.gitignore		.gitignore
ICON_README.md		ICON_README.md
IMPROVEMENTS_SUMMARY.md		IMPROVEMENTS_SUMMARY.md
NEW_FEATURES_GUIDE.md		NEW_FEATURES_GUIDE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SHORTCUTS_GUIDE.md		SHORTCUTS_GUIDE.md
USAGE_GUIDE.md		USAGE_GUIDE.md
config.example.json		config.example.json
config.json		config.json
create_icon.py		create_icon.py
create_shortcuts.bat		create_shortcuts.bat
create_shortcuts.ps1		create_shortcuts.ps1
create_shortcuts_simple.bat		create_shortcuts_simple.bat
hotmic.ico		hotmic.ico
hotmic.py		hotmic.py
hotmic_alt.ico		hotmic_alt.ico
launch_hotmic.bat		launch_hotmic.bat
launch_hotmic_hidden.vbs		launch_hotmic_hidden.vbs
make_shortcuts.ps1		make_shortcuts.ps1
requirements.txt		requirements.txt
test_startup.py		test_startup.py
test_vad.py		test_vad.py

shanjiaming/speech2text

Folders and files

Latest commit

History

Repository files navigation

HotMic: Cross-Platform Global Hotkey Push-to-Talk for Speech-to-Text

Recording Modes

Toggle Mode (Original)

Hold-to-Talk Mode (NEW!)

Mouse Button Mode (NEW!)

Requirements

Windows

macOS

Linux

Setup (Windows)

Quick Start

Default Hotkeys (Windows)

Auto-Start at Login (Windows)

Uninstall Auto-Start (Windows)

Setup (macOS)

Quick Start

Run

Default Hotkeys (macOS)

Permissions (macOS)

Auto-Start at Login (macOS)

Configuration

Configuration Options

Basic Options

Hold-to-Talk Mode (NEW!)

Mouse Button Recording (NEW!)

Visual Feedback (NEW!)

Voice Activity Detection (NEW!)

Microphone Testing (NEW!)

Platform-Specific Notes

Windows

macOS

Linux

Troubleshooting

Windows

macOS

All Platforms

Development

New Features

Voice Activity Detection (VAD)

Microphone Testing

Self-Hosting

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages