Local AI transcription that runs 100% offline. Code at the speed of thought.
Private. Open Source. No Cloud Required.
π Install β’ β¨ Features β’ π» Usage β’ ποΈ Architecture β’ π€ Contribute
SpeakEasy is an open-source, privacy-focused voice-to-text and speech recognition application built for developers, writers, and privacy-conscious users. Unlike cloud-based transcription services like Otter.ai, Rev.ai, or Google Speech-to-Text, SpeakEasy runs entirely offline on your local machine using open-source AI models including OpenAI Whisper, NVIDIA NeMo, and Mistral Voxtral.
- ποΈ Real-time transcription with near-zero latency
- π 100% offline - no internet required, no data leaves your device
- β‘ GPU accelerated - CUDA support for NVIDIA graphics cards
- π» Cross-platform - Windows, macOS, and Linux support
- π Vibe Coding - Stay in flow, dictate code naturally
- π― Developer-first - IDE integration, hotkeys, CLI support
| π Best For | π‘ Why |
|---|---|
| Developers | Code faster with voice. Global hotkeys work in any IDE (VS Code, Cursor, JetBrains) |
| Privacy Advocates | Zero cloud calls. Your voice stays on your machine |
| Writers | Dictate articles, emails, notes without typing fatigue |
| Accessibility | Voice control for users with RSI, disabilities, or typing limitations |
| Security-Conscious | Air-gapped environments, no data exfiltration risk |
| Feature | Description |
|---|---|
| Global Hotkey | Press and hold to transcribe into any active window |
| Universal Compatibility | Works with any application (IDEs, editors, browsers, chat apps) |
| Smart Formatting | Automatic punctuation, capitalization, and code formatting |
| Multi-Model Support | Choose between Whisper, NeMo, or Voxtral based on your needs |
| Audio File Processing | Batch transcribe MP3, WAV, M4A, and more |
| Real-time Preview | See transcription as you speak |
- β 100% Offline - Zero network calls for transcription
- β Local Processing - All models run on your hardware
- β No Signup - No account, email, or API keys required
- β No Telemetry - No usage tracking or data collection
- β Open Source - Full transparency, audit the code
- Batch Transcription: Process multiple audio files in a queue with real-time progress tracking
- Transcription History: Searchable SQLite database of all your transcriptions
- History Import/Export: Backup and restore your history with merge or replace options
- Export Formats: JSON, TXT, SRT, VTT, CSV, DOCX for different use cases
- Model Download Progress: Real-time download tracking with speed and ETA
- Model Caching: Download and cache models for faster startup times
- Custom Hotkeys: Configure global shortcuts to your preference
- System Tray: Quick access without cluttering your dock
- CLI Support: Command-line transcription for automation
- Plugin System: Custom post-processing scripts (WIP)
- Python 3.10 - 3.12 (Python 3.13+ not yet supported)
- Node.js 18+ (LTS recommended)
- FFmpeg (must be in system PATH)
- UV package manager (
pip install uv) - Windows: Visual C++ Build Tools
Windows (Recommended):
git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy
start.batmacOS/Linux:
git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy
./start.sh# Clone repository
git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy
# Setup backend
cd backend
uv venv --python 3.12
source .venv/bin/activate # Windows: .venv\Scripts\activate
uv pip install -e ".[cuda]" # Without CUDA: uv pip install -e .
# Run tests
uv run pytest tests/ -v
# Setup frontend
cd ../gui
npm install
npm run devThe easiest way to use SpeakEasy is through the Electron GUI:
# Quick start with default settings
npm run dev # Development mode
npm run build # Production build
npm run start # Run built appFeatures:
- Visual transcription history
- Model switching (Whisper/NeMo/Voxtral)
- Settings management
- Audio file import
Use SpeakEasy from the terminal for automation and scripting:
# Transcribe with default settings
python -m speakeasy transcribe
# Transcribe an audio file
python -m speakeasy transcribe --file recording.mp3 --output transcript.txt
# List available models
python -m speakeasy models
# Use specific model
python -m speakeasy transcribe --model whisper-large-v3
# Batch process directory
python -m speakeasy transcribe --batch ./audio_files/ --output ./transcripts/
# Get help
python -m speakeasy --help
python -m speakeasy transcribe --helpSet up a global hotkey to transcribe into any active window:
-
Start the backend:
cd backend source .venv/bin/activate python -m speakeasy.server
-
Configure hotkey in the GUI (default:
Ctrl+Shift+Space) -
Use anywhere:
- Hold hotkey β Speak β Release β Text appears in focused window
Stream transcription in real-time:
# Real-time transcription to stdout
python -m speakeasy live
# Real-time with specific model
python -m speakeasy live --model nemo --language en
# Save to file while transcribing
python -m speakeasy live --output live_transcript.txtβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββββ β
β β Electron GUI β β CLI Tool β β Global Hotkey β β
β β (React) β β (Python) β β (Listener) β β
β βββββββββ¬ββββββββ βββββββββ¬ββββββββ ββββββββββ¬βββββββββ β
ββββββββββββΌβββββββββββββββββββΌβββββββββββββββββββΌβββββββββββ
β β β
ββββββββββββββββββββΌβββββββββββββββββββ
β HTTP API
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SPEAKEASY BACKEND β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β FastAPI Server β β Audio Processor β β
β β (Python) β β (FFmpeg/Buffer) β β
β ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββ β
β β β β
β βββββββββββββ¬ββββββββββ β
β β Load & Run β
β βββββββββββββΌββββββββββββ β
β β AI Model Engine β β
β β (CTranslate2/ONNX) β β
β βββββββββββββ¬ββββββββββββ β
β β β
β βββββββββββββΌββββββββββββ β
β β βββββββ βββββββ ββββ β β
β β βWhis β βNeMo β βVoxβ β β
β β βper β β β βtralβ β
β β βββββββ βββββββ ββββ β β
β βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA STORAGE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β SQLite DB β β Model Cache β β
β β (History/Config)β β (~2-10GB each) β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Tech Stack:
- Frontend: Electron + React + Tailwind CSS + TypeScript
- Backend: Python + FastAPI + WebSocket
- AI Engine: PyTorch, CTranslate2, ONNX Runtime
- Audio: FFmpeg, PyAudio, SoundDevice
- Database: SQLite with full-text search
| Model | Size | Speed | Accuracy | Best For | Hardware |
|---|---|---|---|---|---|
| Whisper Tiny | 39MB | β‘β‘β‘β‘β‘ | βββ | Quick tests, low-resource | CPU |
| Whisper Base | 74MB | β‘β‘β‘β‘ | ββββ | Balanced speed/accuracy | CPU |
| Whisper Small | 244MB | β‘β‘β‘ | ββββ | Good general use | CPU/GPU |
| Whisper Medium | 769MB | β‘β‘ | βββββ | High accuracy | GPU recommended |
| Whisper Large-v3 | 1.5GB | β‘ | βββββ | Best accuracy | GPU required |
| NeMo FastConformer | 110MB | β‘β‘β‘β‘β‘ | ββββ | Real-time streaming | GPU recommended |
| Voxtral Mini | 3B | β‘ | βββββ | Complex dictation | GPU required |
| Voxtral Large | 7B | β‘ | βββββ | Maximum accuracy | High-end GPU |
| Feature | SpeakEasy | Otter.ai | Whisper API | Dragon | Apple Dictation |
|---|---|---|---|---|---|
| Privacy | β 100% offline | β Cloud only | β Cloud only | β Cloud required | |
| Cost | π Free | π° $10-20/mo | π° $0.006/min | π° $500+ | π Free |
| Open Source | β Yes | β No | β Yes (API only) | β No | β No |
| Offline | β Yes | β No | β No | ||
| Cross-Platform | β Win/Mac/Linux | β Yes | N/A | β Windows only | β Apple only |
| Custom Models | β Yes | β No | β No | β No | β No |
| Latency | π’ <100ms | π‘ ~1s | π‘ ~500ms | π’ <200ms | π‘ ~300ms |
- Local Whisper transcription
- Electron GUI
- Global hotkeys
- CLI interface
- Multi-model support (Whisper, NeMo, Voxtral)
- Audio file processing
- Batch transcription
- History import/export
- Model download progress tracking
- Real-time WebSocket updates
- Advanced export formats (SRT, VTT, CSV, DOCX)
- VS Code extension
- Custom wake words
- Voice commands (beyond transcription)
- Plugin system
- Docker deployment
- Mobile companion app
- Web interface
- Enterprise features (SSO, audit logs)
- Real-time collaboration
See GitHub Issues for detailed backlog.
We welcome contributions! See CONTRIBUTING.md for:
- π Reporting bugs and requesting features
- π οΈ Setting up your development environment
- π Code style and submission process
- π Review and approval workflow
MIT License - see LICENSE for details.
- OpenAI Whisper - Speech recognition model
- NVIDIA NeMo - Speech AI toolkit
- Mistral AI - Voxtral models
- Faster Whisper - Optimized inference
- CTranslate2 - Fast inference engine
β Star this repo if you find it useful!
π Report Bug β’ π‘ Request Feature β’ π¬ Discussions
Made with β€οΈ for privacy-conscious developers everywhere