SpeakEasy

Privacy-First Voice-to-Text for Developers

Local AI transcription that runs 100% offline. Code at the speed of thought.
Private. Open Source. No Cloud Required.

🚀 Install • ✨ Features • 💻 Usage • 🏗️ Architecture • 🤝 Contribute

📖 Overview

SpeakEasy is an open-source, privacy-focused voice-to-text and speech recognition application built for developers, writers, and privacy-conscious users. Unlike cloud-based transcription services like Otter.ai, Rev.ai, or Google Speech-to-Text, SpeakEasy runs entirely offline on your local machine using open-source AI models including OpenAI Whisper, NVIDIA NeMo, and Mistral Voxtral.

🎙️ Real-time transcription with near-zero latency
🔒 100% offline - no internet required, no data leaves your device
⚡ GPU accelerated - CUDA support for NVIDIA graphics cards
💻 Cross-platform - Windows, macOS, and Linux support
🚀 Vibe Coding - Stay in flow, dictate code naturally
🎯 Developer-first - IDE integration, hotkeys, CLI support

Why Choose SpeakEasy?

🏆 Best For	💡 Why
Developers	Code faster with voice. Global hotkeys work in any IDE (VS Code, Cursor, JetBrains)
Privacy Advocates	Zero cloud calls. Your voice stays on your machine
Writers	Dictate articles, emails, notes without typing fatigue
Accessibility	Voice control for users with RSI, disabilities, or typing limitations
Security-Conscious	Air-gapped environments, no data exfiltration risk

✨ Features

🎙️ Core Transcription

Feature	Description
Global Hotkey	Press and hold to transcribe into any active window
Universal Compatibility	Works with any application (IDEs, editors, browsers, chat apps)
Smart Formatting	Automatic punctuation, capitalization, and code formatting
Multi-Model Support	Choose between Whisper, NeMo, or Voxtral based on your needs
Audio File Processing	Batch transcribe MP3, WAV, M4A, and more
Real-time Preview	See transcription as you speak

🔐 Privacy & Security

✅ 100% Offline - Zero network calls for transcription
✅ Local Processing - All models run on your hardware
✅ No Signup - No account, email, or API keys required
✅ No Telemetry - No usage tracking or data collection
✅ Open Source - Full transparency, audit the code

⚡ Power Features

Batch Transcription: Process multiple audio files in a queue with real-time progress tracking
Transcription History: Searchable SQLite database of all your transcriptions
History Import/Export: Backup and restore your history with merge or replace options
Export Formats: JSON, TXT, SRT, VTT, CSV, DOCX for different use cases
Model Download Progress: Real-time download tracking with speed and ETA
Model Caching: Download and cache models for faster startup times
Custom Hotkeys: Configure global shortcuts to your preference
System Tray: Quick access without cluttering your dock
CLI Support: Command-line transcription for automation
Plugin System: Custom post-processing scripts (WIP)

🚀 Quick Start

Prerequisites

Python 3.10 - 3.12 (Python 3.13+ not yet supported)
Node.js 18+ (LTS recommended)
FFmpeg (must be in system PATH)
UV package manager (pip install uv)
Windows: Visual C++ Build Tools

⚡ One-Command Install

Windows (Recommended):

git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy
start.bat

macOS/Linux:

git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy
./start.sh

🛠️ Manual Setup

# Clone repository
git clone https://github.com/bitgineer/speakeasy.git
cd speakeasy

# Setup backend
cd backend
uv venv --python 3.12
source .venv/bin/activate  # Windows: .venv\Scripts\activate
uv pip install -e ".[cuda]"  # Without CUDA: uv pip install -e .

# Run tests
uv run pytest tests/ -v

# Setup frontend
cd ../gui
npm install
npm run dev

💻 Usage

🖱️ GUI Mode (Desktop App)

The easiest way to use SpeakEasy is through the Electron GUI:

# Quick start with default settings
npm run dev        # Development mode
npm run build      # Production build
npm run start      # Run built app

Features:

Visual transcription history
Model switching (Whisper/NeMo/Voxtral)
Settings management
Audio file import

⌨️ CLI Mode (Command Line)

Use SpeakEasy from the terminal for automation and scripting:

# Transcribe with default settings
python -m speakeasy transcribe

# Transcribe an audio file
python -m speakeasy transcribe --file recording.mp3 --output transcript.txt

# List available models
python -m speakeasy models

# Use specific model
python -m speakeasy transcribe --model whisper-large-v3

# Batch process directory
python -m speakeasy transcribe --batch ./audio_files/ --output ./transcripts/

# Get help
python -m speakeasy --help
python -m speakeasy transcribe --help

🔥 Global Hotkey Mode

Set up a global hotkey to transcribe into any active window:

Start the backend:

cd backend
source .venv/bin/activate
python -m speakeasy.server

Configure hotkey in the GUI (default: Ctrl+Shift+Space)
Use anywhere:
- Hold hotkey → Speak → Release → Text appears in focused window

🎙️ Live Mode

Stream transcription in real-time:

# Real-time transcription to stdout
python -m speakeasy live

# Real-time with specific model
python -m speakeasy live --model nemo --language en

# Save to file while transcribing
python -m speakeasy live --output live_transcript.txt

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                        USER INTERFACE                        │
├─────────────────────────────────────────────────────────────┤
│  ┌───────────────┐  ┌───────────────┐  ┌─────────────────┐ │
│  │  Electron GUI │  │    CLI Tool   │  │  Global Hotkey  │ │
│  │   (React)     │  │  (Python)     │  │   (Listener)    │ │
│  └───────┬───────┘  └───────┬───────┘  └────────┬────────┘ │
└──────────┼──────────────────┼──────────────────┼──────────┘
           │                  │                  │
           └──────────────────┼──────────────────┘
                              │ HTTP API
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     SPEAKEASY BACKEND                        │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐  ┌──────────────────┐                │
│  │   FastAPI Server │  │  Audio Processor │                │
│  │   (Python)       │  │  (FFmpeg/Buffer) │                │
│  └────────┬─────────┘  └────────┬─────────┘                │
│           │                     │                          │
│           └───────────┬─────────┘                          │
│                       │ Load & Run                         │
│           ┌───────────▼───────────┐                       │
│           │    AI Model Engine    │                       │
│           │  (CTranslate2/ONNX)   │                       │
│           └───────────┬───────────┘                       │
│                       │                                   │
│           ┌───────────▼───────────┐                       │
│           │  ┌─────┐ ┌─────┐ ┌──┐ │                       │
│           │  │Whis │ │NeMo │ │Vox│ │                       │
│           │  │per  │ │     │ │tral│                       │
│           │  └─────┘ └─────┘ └──┘ │                       │
│           └───────────────────────┘                       │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      DATA STORAGE                            │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐  ┌──────────────────┐                │
│  │   SQLite DB      │  │   Model Cache    │                │
│  │  (History/Config)│  │  (~2-10GB each)  │                │
│  └──────────────────┘  └──────────────────┘                │
└─────────────────────────────────────────────────────────────┘

Tech Stack:

Frontend: Electron + React + Tailwind CSS + TypeScript
Backend: Python + FastAPI + WebSocket
AI Engine: PyTorch, CTranslate2, ONNX Runtime
Audio: FFmpeg, PyAudio, SoundDevice
Database: SQLite with full-text search

🤖 Supported Models

Model	Size	Speed	Accuracy	Best For	Hardware
Whisper Tiny	39MB	⚡⚡⚡⚡⚡	⭐⭐⭐	Quick tests, low-resource	CPU
Whisper Base	74MB	⚡⚡⚡⚡	⭐⭐⭐⭐	Balanced speed/accuracy	CPU
Whisper Small	244MB	⚡⚡⚡	⭐⭐⭐⭐	Good general use	CPU/GPU
Whisper Medium	769MB	⚡⚡	⭐⭐⭐⭐⭐	High accuracy	GPU recommended
Whisper Large-v3	1.5GB	⚡	⭐⭐⭐⭐⭐	Best accuracy	GPU required
NeMo FastConformer	110MB	⚡⚡⚡⚡⚡	⭐⭐⭐⭐	Real-time streaming	GPU recommended
Voxtral Mini	3B	⚡	⭐⭐⭐⭐⭐	Complex dictation	GPU required
Voxtral Large	7B	⚡	⭐⭐⭐⭐⭐	Maximum accuracy	High-end GPU

🆚 Alternatives Comparison

Feature	SpeakEasy	Otter.ai	Whisper API	Dragon	Apple Dictation
Privacy	✅ 100% offline	❌ Cloud only	❌ Cloud only	❌ Cloud required	⚠️ Cloud optional
Cost	🆓 Free	💰 $10-20/mo	💰 $0.006/min	💰 $500+	🆓 Free
Open Source	✅ Yes	❌ No	✅ Yes (API only)	❌ No	❌ No
Offline	✅ Yes	❌ No	❌ No	⚠️ Limited	⚠️ Limited
Cross-Platform	✅ Win/Mac/Linux	✅ Yes	N/A	❌ Windows only	❌ Apple only
Custom Models	✅ Yes	❌ No	❌ No	❌ No	❌ No
Latency	🟢 <100ms	🟡 ~1s	🟡 ~500ms	🟢 <200ms	🟡 ~300ms

🗺️ Roadmap

Current (v0.1.0)

Near-term (v0.2.0)

Future (v1.0.0)

Mobile companion app
Web interface
Enterprise features (SSO, audit logs)
Real-time collaboration

See GitHub Issues for detailed backlog.

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for:

🐛 Reporting bugs and requesting features
🛠️ Setting up your development environment
📝 Code style and submission process
👀 Review and approval workflow

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

OpenAI Whisper - Speech recognition model
NVIDIA NeMo - Speech AI toolkit
Mistral AI - Voxtral models
Faster Whisper - Optimized inference
CTranslate2 - Fast inference engine

⭐ Star this repo if you find it useful!

🐛 Report Bug • 💡 Request Feature • 💬 Discussions

_{Made with ❤️ for privacy-conscious developers everywhere}

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
.github		.github
.legacy/faster_whisper_hotkey		.legacy/faster_whisper_hotkey
backend		backend
docs/images		docs/images
gui		gui
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
install.bat		install.bat
install.sh		install.sh
reinstall_backend.bat		reinstall_backend.bat
reinstall_backend.sh		reinstall_backend.sh
start.bat		start.bat
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakEasy

SpeakEasy

Privacy-First Voice-to-Text for Developers

📖 Overview

Why Choose SpeakEasy?

✨ Features

🎙️ Core Transcription

🔐 Privacy & Security

⚡ Power Features

🚀 Quick Start

Prerequisites

⚡ One-Command Install

🛠️ Manual Setup

💻 Usage

🖱️ GUI Mode (Desktop App)

⌨️ CLI Mode (Command Line)

🔥 Global Hotkey Mode

🎙️ Live Mode

🏗️ Architecture

🤖 Supported Models

🆚 Alternatives Comparison

🗺️ Roadmap

Current (v0.1.0)

Near-term (v0.2.0)

Future (v1.0.0)

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeakEasy

SpeakEasy

Privacy-First Voice-to-Text for Developers

📖 Overview

Why Choose SpeakEasy?

✨ Features

🎙️ Core Transcription

🔐 Privacy & Security

⚡ Power Features

🚀 Quick Start

Prerequisites

⚡ One-Command Install

🛠️ Manual Setup

💻 Usage

🖱️ GUI Mode (Desktop App)

⌨️ CLI Mode (Command Line)

🔥 Global Hotkey Mode

🎙️ Live Mode

🏗️ Architecture

🤖 Supported Models

🆚 Alternatives Comparison

🗺️ Roadmap

Current (v0.1.0)

Near-term (v0.2.0)

Future (v1.0.0)

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages