A free, open source, and extensible speech-to-text application that works completely offline.
Handy is a cross-platform desktop application built with Tauri (Rust + React/TypeScript) that provides simple, privacy-focused speech transcription. Press a shortcut, speak, and have your words appear in any text field—all without sending your voice to the cloud.
Handy was created to fill the gap for a truly open source, extensible speech-to-text tool. As stated on handy.computer:
- Free: Accessibility tooling belongs in everyone's hands, not behind a paywall
- Open Source: Together we can build further. Extend Handy for yourself and contribute to something bigger
- Private: Your voice stays on your computer. Get transcriptions without sending audio to the cloud
- Simple: One tool, one job. Transcribe what you say and put it into a text box
Handy isn't trying to be the best speech-to-text app—it's trying to be the most forkable one.
- Press a configurable keyboard shortcut to start/stop recording (or use push-to-talk mode)
- Speak your words while the shortcut is active
- Release and Handy processes your speech using Whisper
- Get your transcribed text pasted directly into whatever app you're using
The process is entirely local:
- Silence is filtered using VAD (Voice Activity Detection) with Silero
- Transcription uses Whisper Small model with GPU acceleration when available
- Works on Windows, macOS, and Linux
- Download the latest release from the releases page or the website
- Install the application following platform-specific instructions
- Launch Handy and grant necessary system permissions (microphone, accessibility)
- Configure your preferred keyboard shortcuts in Settings
- Start transcribing!
Prerequisites:
- Rust (latest stable)
- Bun package manager
- Platform-specific requirements:
- macOS: Xcode Command Line Tools
- Windows: Microsoft C++ Build Tools
- Linux: Build essentials, ALSA development libraries
Getting Started:
# Clone the repository
git clone git@github.com:cjpais/Handy.git
cd Handy
# Install dependencies
bun install
# Run in development mode
bun run tauri dev
# if it fails with cmake error on MacOS, try
CMAKE_POLICY_VERSION_MINIMUM=3.5 bun run tauri dev
# Build for production
bun run tauri buildModel Files Setup:
For development, you need to download the required model files:
-
Create the models directory inside the resources folder:
mkdir -p src-tauri/resources/models
-
Download the required VAD model for development:
# Download Silero VAD model (required for voice activity detection) curl -o src-tauri/resources/models/silero_vad_v4.onnx https://blob.handy.computer/silero_vad_v4.onnxNote: Whisper models are no longer bundled with the app. Users will download their preferred model (Small, Medium, Turbo, or Large) from within the app on first run.
Whisper Models:
The app now supports dynamic model downloading and switching:
- Small: Fast, good for most use cases
- Medium: Better accuracy, balanced performance
- Turbo: Optimized large model with improved speed
- Large: Highest accuracy, slower processing
Users can download and switch between models directly from the app's settings interface. No models are bundled with the app, reducing the initial download size.
Qwen3-ASR (Apple Silicon only):
Handy also supports Qwen3-ASR-0.6B via mlx-audio for native Apple Silicon inference. This requires uv (brew install uv) and is set up automatically from the model selector. See the Qwen3-ASR Setup Guide for details and troubleshooting.
Language Support:
Handy supports multilingual transcription with special handling for Chinese users:
- Auto: Automatically detects the spoken language
- Auto (Prefer Trad. Chinese): Auto-detects English/Chinese, outputs Chinese as Traditional characters
- Chinese Traditional: Forces Traditional Chinese output (繁體中文)
- Chinese Simplified: Forces Simplified Chinese output (简体中文)
- 50+ other languages: Including Spanish, French, Japanese, Korean, and more
The "Auto (Prefer Trad. Chinese)" option is ideal for bilingual English/Chinese speakers who want automatic language detection while ensuring Chinese text always appears in Traditional characters.
Handy is built as a Tauri application combining:
- Frontend: React + TypeScript with Tailwind CSS for the settings UI
- Backend: Rust for system integration, audio processing, and ML inference
- Core Libraries:
whisper-rs: Local speech recognition with Whisper modelscpal: Cross-platform audio I/Ovad-rs: Voice Activity Detectionrdev: Global keyboard shortcuts and system eventsrubato: Audio resampling
This project is actively being developed and has some known issues. We believe in transparency about the current state:
- Apple Silicon Macs
- x64 Windows
- x64 Linux
- Paste functionality occasionally produces just 'v' instead of full text on macOS
- VAD filter sometimes includes trailing "thank you" in transcriptions
- Transcription end-cutting due to potential threading issues
- Microphone remains active for optimal latency (design choice under discussion)
We're actively seeking contributors! Priority areas include:
- Cross-platform support - Windows and Linux compatibility
- Code quality improvements - Better error handling, architecture refinements
- Bug fixes - Address the known issues listed above
- Performance optimization - Reduce latency, improve resource usage
- Configurable microphone selection
- Multiple STT model options (beyond Whisper Small)
- Modifier-only key bindings
- Enhanced VAD configuration
- Check existing issues at github.com/cjpais/Handy/issues
- Fork the repository and create a feature branch
- Test thoroughly on your target platform
- Submit a pull request with clear description of changes
- Join the discussion - reach out at contact@handy.computer
The goal is to create both a useful tool and a foundation for others to build upon—a well-patterned, simple codebase that serves the community.
- Handy CLI - The original Python command-line version
- handy.computer - Project website with demos and documentation
MIT License - see LICENSE file for details.
- Whisper by OpenAI for the speech recognition model
- whisper.cpp and ggml for amazing cross-platform whisper inference/acceleration
- Silero for great lightweight VAD
- Tauri team for the excellent Rust-based app framework
- Community contributors helping make Handy better
"Your search for the right speech-to-text tool can end here—not because Handy is perfect, but because you can make it perfect for you."