Skip to content

shanjiaming/speech2text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HotMic: Cross-Platform Global Hotkey Push-to-Talk for Speech-to-Text

A lightweight background helper for Windows, macOS, and Linux that:

  • Starts/stops mic capture with a single global hotkey (toggle mode) or hold-to-talk mode.
  • NEW: Hold Win+Alt (or any custom combo) to record, release to transcribe!
  • NEW: Hold mouse scroll wheel button to record, release to transcribe!
  • NEW: Visual indicator shows when recording is active.
  • NEW: Voice Activity Detection (VAD) prevents sending silent/empty recordings to the server!
  • NEW: Microphone test on startup ensures your mic is working properly!
  • Streams raw PCM16 audio to the Brainwave server (wss://f.gpty.ai/api/v1/ws).
  • Copies the transcript to clipboard and (optionally) auto-pastes at the current cursor.
  • Can run headless and auto-start at login.

Recording Modes

Toggle Mode (Original)

Press the hotkey once to start recording, press again to stop. The transcript is copied/pasted when you stop.

Hold-to-Talk Mode (NEW!)

Hold down Win+Alt (or your custom combo) to record. Release to stop and get the transcript immediately. Perfect for quick voice commands!

Mouse Button Mode (NEW!)

Hold down your mouse scroll wheel (or any button) to record. Release to transcribe. Great for gaming or when your hands are on the mouse!

Requirements

Windows

  • Windows 10/11
  • Python 3.9+
  • Microphone access permission

macOS

  • macOS 12+
  • Python 3.9+
  • Microphone access permission for your terminal/app running the script
  • Accessibility permission for auto-paste (if enabled)

Linux

  • Python 3.9+
  • ALSA/PulseAudio for audio
  • X11 or Wayland for hotkeys

Setup (Windows)

Quick Start

  1. Install Python (if not already installed):

    • Download from python.org
    • During installation, check "Add Python to PATH"
  2. Run the installation script:

    scripts\install_windows.bat

    This will:

    • Create a virtual environment
    • Install all required dependencies
  3. First Run:

    scripts\run_hotmic.bat

    Or manually:

    .venv\Scripts\activate.bat
    python hotmic.py

Default Hotkeys (Windows)

  • Toggle mode: Ctrl+U (press once to start, again to stop)
  • Hold-to-talk mode: Win+Alt (hold to record, release to transcribe)
  • Mouse button: Middle mouse button / scroll wheel (hold to record, release to transcribe)

When recording stops, the transcript is automatically copied to the clipboard and—if enabled—auto-pasted with Ctrl+V.

A red recording indicator will appear at the top of your screen when recording is active (if show_visual_indicator is enabled).

Auto-Start at Login (Windows)

To have HotMic start automatically when you log in:

  1. Complete the Setup and First Run (to ensure everything works)

  2. Run the auto-start installation (requires PowerShell):

    powershell -ExecutionPolicy Bypass -File scripts\install_autostart.ps1
  3. Grant Permissions (if prompted):

    • Allow microphone access when Windows prompts you
    • If auto-paste doesn't work, check Windows accessibility settings
  4. View Logs:

    • Output: %LOCALAPPDATA%\HotMic\Logs\hotmic.log
    • Errors: %LOCALAPPDATA%\HotMic\Logs\hotmic_error.log

Uninstall Auto-Start (Windows)

powershell -ExecutionPolicy Bypass -File scripts\uninstall_autostart.ps1

Setup (macOS)

Quick Start

cd path/to/your/cloned/repo
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run

python hotmic.py

Default Hotkeys (macOS)

  • Toggle mode: Cmd+U (press once to start, again to stop)
  • Hold-to-talk mode: Cmd+Alt (hold to record, release to transcribe)
  • Mouse button: Middle mouse button / scroll wheel (hold to record, release to transcribe)

When recording stops, the transcript is automatically copied to the clipboard and—if enabled—auto-pasted with Cmd+V.

A red recording indicator will appear at the top of your screen when recording is active (if show_visual_indicator is enabled).

Permissions (macOS)

  • Microphone: Run hotmic.py once; macOS prompts for mic access.
  • Accessibility (for autopaste): System Settings → Privacy & Security → Accessibility → allow Terminal/iTerm/Your app.

Auto-Start at Login (macOS)

Use the LaunchAgent scripts:

Install:

scripts/install_launchagent.sh

Uninstall:

scripts/uninstall_launchagent.sh

Logs:

  • ~/Library/Logs/com.hotmic.out.log
  • ~/Library/Logs/com.hotmic.err.log

Permissions when autostarting:

  • System Settings → Privacy & Security:
    • Accessibility: find the existing "python" entry and toggle it ON (needed for auto‑paste)
    • Input Monitoring: find the existing "python" entry and toggle it ON (needed for hotkeys)

Configuration

Edit config.json to customize settings:

{
  "endpoint": "wss://f.gpty.ai/api/v1/ws",
  "hotkey": "<ctrl>+u",         // Toggle mode hotkey
  "autopaste": true,
  "samplerate": 48000,
  "channels": 1,
  "block_samples": 24000,
  "input_device": null,
  "connect_timeout": 8.0,
  "stop_flush_wait": 0.5,
  "enable_hold_mode": true,     // Enable hold-to-talk mode
  "hold_hotkey": "<cmd>+<alt>", // Hold this combo to record (Windows key + Alt)
  "enable_mouse_button": true,  // Enable mouse button recording
  "mouse_button": "middle",     // Mouse button to use (left/middle/right)
  "show_visual_indicator": true // Show visual indicator when recording
}

Configuration Options

Basic Options

  • endpoint: WebSocket server URL for speech-to-text service
  • hotkey: Global hotkey to toggle recording (press once to start, again to stop)
    • Windows/Linux: Use <ctrl>, <alt>, <shift>, <cmd> (Windows key)
    • macOS: Use <cmd>, <alt>, <ctrl>
    • Examples: "<ctrl>+u", "<alt>+<shift>+r"
  • autopaste: Automatically paste the transcript after recording (true/false)
  • samplerate: Audio sample rate (48000 recommended)
  • channels: Number of audio channels (1 for mono)
  • block_samples: Samples per audio block (affects latency)
  • input_device: Audio input device ID (null for default)
  • connect_timeout: WebSocket connection timeout in seconds
  • stop_flush_wait: Time to wait for final audio chunks after stopping

Hold-to-Talk Mode (NEW!)

  • enable_hold_mode: Enable hold-to-talk mode (true/false)
    • When enabled, hold down the specified key combo to record, release to transcribe
  • hold_hotkey: Key combination for hold-to-talk mode
    • Windows example: "<cmd>+<alt>" (Windows key + Alt)
    • macOS example: "<cmd>+<alt>" (Command + Option)
    • On Windows, <cmd> refers to the Windows key
    • You can use any combination of modifier keys

Mouse Button Recording (NEW!)

  • enable_mouse_button: Enable mouse button recording (true/false)
    • When enabled, hold down the specified mouse button to record, release to transcribe
  • mouse_button: Which mouse button to use
    • Options: "left", "middle", "right"
    • "middle" is the scroll wheel button (recommended to avoid conflicts)

Visual Feedback (NEW!)

  • show_visual_indicator: Show a visual indicator window when recording (true/false)
    • When enabled, a red "🎤 Recording..." banner appears at the top of your screen
    • The banner automatically disappears when you stop recording

Voice Activity Detection (NEW!)

  • enable_vad: Enable/disable voice activity detection (true/false)
    • When enabled, the system detects if actual speech was present in the recording
    • Prevents sending silent or empty recordings to the server
    • Saves bandwidth and reduces false transcriptions
  • vad_energy_threshold: Audio energy threshold for speech detection (default: 500.0)
    • Higher values = less sensitive (requires louder audio to be considered speech)
    • Lower values = more sensitive (may pick up background noise)
    • Adjust based on your microphone sensitivity and environment
  • vad_min_speech_duration: Minimum speech duration in seconds (default: 0.3)
    • Minimum amount of speech required to send to server
    • Prevents very short clicks or noise from being sent
  • silence_timeout: Reserved for future automatic silence detection (default: 2.0)

Microphone Testing (NEW!)

  • test_mic_on_startup: Test microphone on startup (true/false)
    • When enabled, records 2 seconds of audio on startup to verify mic is working
    • Displays audio energy levels and warns if microphone appears to be muted
    • Set to false to skip the test and start faster

Platform-Specific Notes

Windows

  • Default hotkey uses Ctrl instead of Cmd
  • Auto-paste uses Ctrl+V
  • Auto-start uses Windows Task Scheduler
  • Clipboard handled by pyperclip

macOS

  • Default hotkey uses Cmd
  • Auto-paste uses Cmd+V
  • Auto-start uses LaunchAgent
  • Requires Accessibility and Input Monitoring permissions

Linux

  • Default hotkey uses Ctrl
  • Auto-paste uses Ctrl+V
  • May require additional setup for hotkeys on Wayland
  • Audio requires ALSA or PulseAudio

Troubleshooting

Windows

  • Hotkey not working: Run as administrator or check if another program is using the hotkey
  • Hold-to-talk not working: Make sure enable_hold_mode is true and the key combination is not used by other programs
  • Mouse button not working: Ensure enable_mouse_button is true and try running as administrator
  • Visual indicator not showing: Check if show_visual_indicator is true in config.json
  • Auto-paste not working: Grant microphone and accessibility permissions
  • Audio issues: Check Windows audio settings and ensure microphone is not muted

macOS

  • Hotkey not working: Grant Input Monitoring permission in System Settings
  • Hold-to-talk not working: Grant Accessibility and Input Monitoring permissions
  • Mouse button not working: Grant Accessibility permission
  • Auto-paste not working: Grant Accessibility permission in System Settings
  • Log shows "not trusted" error: Add python to Accessibility and Input Monitoring

All Platforms

  • No transcript received: Check your internet connection and endpoint URL
  • Choppy audio: Increase block_samples in config.json
  • Connection timeout: Increase connect_timeout in config.json
  • Microphone test fails: Check microphone permissions and ensure mic is not muted
  • Too many silent recordings skipped: Lower vad_energy_threshold in config.json
  • Background noise triggers recording: Increase vad_energy_threshold in config.json
  • VAD not working: Ensure enable_vad is true and numpy is installed (pip install numpy)

Development

The code is designed to be cross-platform:

  • Clipboard operations use pyperclip (cross-platform)
  • Paste hotkey automatically detects OS (Cmd on macOS, Ctrl elsewhere)
  • Audio capture uses sounddevice (cross-platform)
  • Global hotkeys use pynput (cross-platform)
  • Audio processing uses numpy for energy calculation and voice activity detection

New Features

Voice Activity Detection (VAD)

The VAD system uses RMS (Root Mean Square) energy calculation to detect speech:

  • Calculates audio energy for each chunk in real-time
  • Compares against configurable threshold (vad_energy_threshold)
  • Tracks speech duration to filter out brief noise
  • Prevents sending silent recordings to save bandwidth

Microphone Testing

On startup, the system can test the microphone by:

  • Recording 2 seconds of audio
  • Calculating average energy levels
  • Warning if levels are too low (mic muted or not working)
  • Providing feedback to help troubleshoot issues

Self-Hosting

If you prefer to self-host the speech-to-text service:

  1. Run the server from github.com/grapeot/brainwave
  2. Update the endpoint in config.json to your server URL

Credits

  • Uses the public Brainwave service at f.gpty.ai
  • Audio streaming via sounddevice
  • Global hotkeys via pynput
  • Clipboard handling via pyperclip

About

HotMic macOS autostart speech-to-text helper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published