Skip to content

bitgineer/Speakeasy

SpeakEasy

SpeakEasy Logo

SpeakEasy

Privacy-First Voice-to-Text for Developers

Local AI transcription that runs 100% offline. Code at the speed of thought.
Private. Open Source. No Cloud Required.

πŸš€ Install β€’ ✨ Features β€’ πŸ’» Usage β€’ πŸ—οΈ Architecture β€’ 🀝 Contribute

Platform Support License MIT Privacy First AI Models GitHub Stars Vibe Coding Tests


πŸ“– Overview

SpeakEasy is an open-source, privacy-focused voice-to-text and speech recognition application built for developers, writers, and privacy-conscious users. Unlike cloud-based transcription services like Otter.ai, Rev.ai, or Google Speech-to-Text, SpeakEasy runs entirely offline on your local machine using open-source AI models including OpenAI Whisper, NVIDIA NeMo, and Mistral Voxtral.

  • πŸŽ™οΈ Real-time transcription with near-zero latency
  • πŸ”’ 100% offline - no internet required, no data leaves your device
  • ⚑ GPU accelerated - CUDA support for NVIDIA graphics cards
  • πŸ’» Cross-platform - Windows, macOS, and Linux support
  • πŸš€ Vibe Coding - Stay in flow, dictate code naturally
  • 🎯 Developer-first - IDE integration, hotkeys, CLI support

Why Choose SpeakEasy?

πŸ† Best For πŸ’‘ Why
Developers Code faster with voice. Global hotkeys work in any IDE (VS Code, Cursor, JetBrains)
Privacy Advocates Zero cloud calls. Your voice stays on your machine
Writers Dictate articles, emails, notes without typing fatigue
Accessibility Voice control for users with RSI, disabilities, or typing limitations
Security-Conscious Air-gapped environments, no data exfiltration risk

✨ Features

πŸŽ™οΈ Core Transcription

Feature Description
Global Hotkey Press and hold to transcribe into any active window
Universal Compatibility Works with any application (IDEs, editors, browsers, chat apps)
Smart Formatting Automatic punctuation, capitalization, and code formatting
Multi-Model Support Choose between Whisper, NeMo, or Voxtral based on your needs
Audio File Processing Batch transcribe MP3, WAV, M4A, and more
Real-time Preview See transcription as you speak

πŸ” Privacy & Security

  • βœ… 100% Offline - Zero network calls for transcription
  • βœ… Local Processing - All models run on your hardware
  • βœ… No Signup - No account, email, or API keys required
  • βœ… No Telemetry - No usage tracking or data collection
  • βœ… Open Source - Full transparency, audit the code

⚑ Power Features

  • Batch Transcription: Process multiple audio files in a queue with real-time progress tracking
  • Transcription History: Searchable SQLite database of all your transcriptions
  • History Import/Export: Backup and restore your history with merge or replace options
  • Export Formats: JSON, TXT, SRT, VTT, CSV, DOCX for different use cases
  • Model Download Progress: Real-time download tracking with speed and ETA
  • Model Caching: Download and cache models for faster startup times
  • Custom Hotkeys: Configure global shortcuts to your preference
  • System Tray: Quick access without cluttering your dock
  • CLI Support: Command-line transcription for automation
  • Plugin System: Custom post-processing scripts (WIP)

πŸš€ Quick Start

Prerequisites

  • Python 3.10 - 3.12 (Python 3.13+ not yet supported)
  • Node.js 18+ (LTS recommended)
  • FFmpeg (must be in system PATH)
  • UV package manager (pip install uv)
  • Windows: Visual C++ Build Tools

⚑ One-Command Install

Windows (Recommended):

git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy
start.bat

macOS/Linux:

git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy
./start.sh

πŸ› οΈ Manual Setup

# Clone repository
git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy

# Setup backend
cd backend
uv venv --python 3.12
source .venv/bin/activate  # Windows: .venv\Scripts\activate
uv pip install -e ".[cuda]"  # Without CUDA: uv pip install -e .

# Run tests
uv run pytest tests/ -v

# Setup frontend
cd ../gui
npm install
npm run dev

πŸ’» Usage

πŸ–±οΈ GUI Mode (Desktop App)

The easiest way to use SpeakEasy is through the Electron GUI:

# Quick start with default settings
npm run dev        # Development mode
npm run build      # Production build
npm run start      # Run built app

Features:

  • Visual transcription history
  • Model switching (Whisper/NeMo/Voxtral)
  • Settings management
  • Audio file import

⌨️ CLI Mode (Command Line)

Use SpeakEasy from the terminal for automation and scripting:

# Transcribe with default settings
python -m speakeasy transcribe

# Transcribe an audio file
python -m speakeasy transcribe --file recording.mp3 --output transcript.txt

# List available models
python -m speakeasy models

# Use specific model
python -m speakeasy transcribe --model whisper-large-v3

# Batch process directory
python -m speakeasy transcribe --batch ./audio_files/ --output ./transcripts/

# Get help
python -m speakeasy --help
python -m speakeasy transcribe --help

πŸ”₯ Global Hotkey Mode

Set up a global hotkey to transcribe into any active window:

  1. Start the backend:

    cd backend
    source .venv/bin/activate
    python -m speakeasy.server
  2. Configure hotkey in the GUI (default: Ctrl+Shift+Space)

  3. Use anywhere:

    • Hold hotkey β†’ Speak β†’ Release β†’ Text appears in focused window

πŸŽ™οΈ Live Mode

Stream transcription in real-time:

# Real-time transcription to stdout
python -m speakeasy live

# Real-time with specific model
python -m speakeasy live --model nemo --language en

# Save to file while transcribing
python -m speakeasy live --output live_transcript.txt

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        USER INTERFACE                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Electron GUI β”‚  β”‚    CLI Tool   β”‚  β”‚  Global Hotkey  β”‚ β”‚
β”‚  β”‚   (React)     β”‚  β”‚  (Python)     β”‚  β”‚   (Listener)    β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                  β”‚                  β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ HTTP API
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     SPEAKEASY BACKEND                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚   FastAPI Server β”‚  β”‚  Audio Processor β”‚                β”‚
β”‚  β”‚   (Python)       β”‚  β”‚  (FFmpeg/Buffer) β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚           β”‚                     β”‚                          β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β”‚
β”‚                       β”‚ Load & Run                         β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚           β”‚    AI Model Engine    β”‚                       β”‚
β”‚           β”‚  (CTranslate2/ONNX)   β”‚                       β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚                       β”‚                                   β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚           β”‚  β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β” β”‚                       β”‚
β”‚           β”‚  β”‚Whis β”‚ β”‚NeMo β”‚ β”‚Voxβ”‚ β”‚                       β”‚
β”‚           β”‚  β”‚per  β”‚ β”‚     β”‚ β”‚tralβ”‚                       β”‚
β”‚           β”‚  β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”˜ β”‚                       β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      DATA STORAGE                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚   SQLite DB      β”‚  β”‚   Model Cache    β”‚                β”‚
β”‚  β”‚  (History/Config)β”‚  β”‚  (~2-10GB each)  β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack:

  • Frontend: Electron + React + Tailwind CSS + TypeScript
  • Backend: Python + FastAPI + WebSocket
  • AI Engine: PyTorch, CTranslate2, ONNX Runtime
  • Audio: FFmpeg, PyAudio, SoundDevice
  • Database: SQLite with full-text search

πŸ€– Supported Models

Model Size Speed Accuracy Best For Hardware
Whisper Tiny 39MB ⚑⚑⚑⚑⚑ ⭐⭐⭐ Quick tests, low-resource CPU
Whisper Base 74MB ⚑⚑⚑⚑ ⭐⭐⭐⭐ Balanced speed/accuracy CPU
Whisper Small 244MB ⚑⚑⚑ ⭐⭐⭐⭐ Good general use CPU/GPU
Whisper Medium 769MB ⚑⚑ ⭐⭐⭐⭐⭐ High accuracy GPU recommended
Whisper Large-v3 1.5GB ⚑ ⭐⭐⭐⭐⭐ Best accuracy GPU required
NeMo FastConformer 110MB ⚑⚑⚑⚑⚑ ⭐⭐⭐⭐ Real-time streaming GPU recommended
Voxtral Mini 3B ⚑ ⭐⭐⭐⭐⭐ Complex dictation GPU required
Voxtral Large 7B ⚑ ⭐⭐⭐⭐⭐ Maximum accuracy High-end GPU

πŸ†š Alternatives Comparison

Feature SpeakEasy Otter.ai Whisper API Dragon Apple Dictation
Privacy βœ… 100% offline ❌ Cloud only ❌ Cloud only ❌ Cloud required ⚠️ Cloud optional
Cost πŸ†“ Free πŸ’° $10-20/mo πŸ’° $0.006/min πŸ’° $500+ πŸ†“ Free
Open Source βœ… Yes ❌ No βœ… Yes (API only) ❌ No ❌ No
Offline βœ… Yes ❌ No ❌ No ⚠️ Limited ⚠️ Limited
Cross-Platform βœ… Win/Mac/Linux βœ… Yes N/A ❌ Windows only ❌ Apple only
Custom Models βœ… Yes ❌ No ❌ No ❌ No ❌ No
Latency 🟒 <100ms 🟑 ~1s 🟑 ~500ms 🟒 <200ms 🟑 ~300ms

πŸ—ΊοΈ Roadmap

Current (v0.1.0)

  • Local Whisper transcription
  • Electron GUI
  • Global hotkeys
  • CLI interface
  • Multi-model support (Whisper, NeMo, Voxtral)
  • Audio file processing
  • Batch transcription
  • History import/export
  • Model download progress tracking
  • Real-time WebSocket updates
  • Advanced export formats (SRT, VTT, CSV, DOCX)

Near-term (v0.2.0)

  • VS Code extension
  • Custom wake words
  • Voice commands (beyond transcription)
  • Plugin system
  • Docker deployment

Future (v1.0.0)

  • Mobile companion app
  • Web interface
  • Enterprise features (SSO, audit logs)
  • Real-time collaboration

See GitHub Issues for detailed backlog.

🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for:

  • πŸ› Reporting bugs and requesting features
  • πŸ› οΈ Setting up your development environment
  • πŸ“ Code style and submission process
  • πŸ‘€ Review and approval workflow

πŸ“„ License

MIT License - see LICENSE for details.

πŸ™ Acknowledgments


⭐ Star this repo if you find it useful!

πŸ› Report Bug β€’ πŸ’‘ Request Feature β€’ πŸ’¬ Discussions

Made with ❀️ for privacy-conscious developers everywhere