Voice-to-text dictation for Linux/Wayland
Press hotkey → speak → release → text appears in your active window. Fast, accurate, 100% local.
Talk to your computer. Your words appear wherever your cursor is—terminal, browser, editor, anywhere.
Transcription runs locally on your machine. No internet required. No cloud APIs. Your voice never leaves your computer. If you want a custom dictionary and corrections, you can use a simple algorithm locally or Groq for high speed LLM corrections of dictation.
Status: Work in progress, but daily-driven. Core functionality is stable.
Push-to-Talk (Standard)
- Press and hold hotkey → speak → release
- Text appears in active window
- Perfect for quick dictation bursts
Long-Running Dictation (Chat Mode)
- Press hotkey once to start continuous dictation
- Tap Enter to submit current text → pastes → presses Enter → immediately starts next dictation
- Ideal for long conversations (Discord, Slack, messaging)
- Keep dictating without re-pressing hotkey
Quick-Submit
- Press hotkey once to start
- Tap Windows/Super key to submit → pastes → presses Enter → ends dictation
- Fast workflow for single messages
Cancel Anytime
- Press Escape to cancel current dictation
Local Mode (Default)
- 100% offline transcription
- Fast, private, no API calls
- Good accuracy out of the box
LLM Correction Mode
- Transcription + grammar/formatting cleanup via Groq API
- Fixes punctuation, capitalization, common errors
- Requires API key (see config)
Tool Calling Mode (experimental)
- Voice commands that trigger actions
- "Send this email", "Create reminder", etc.
- Requires tool definitions (see docs)
1. Install dependencies:
# Arch Linux
sudo pacman -S ffmpeg wl-clipboard ydotool
# Ubuntu/Debian
sudo apt install ffmpeg wl-clipboard ydotool
# Enable ydotool (required for auto-paste)
sudo systemctl enable --now ydotool2. Run installer:
git clone https://github.com/sebkouba/omarchy-stt
cd omarchy-stt
./install.shThe installer will:
- Build the app (~3 min)
- Download AI model (~400MB)
- Help you pick your microphone
- Show you what hotkey to set
3. Start it:
# Start background services
systemctl --user start recording-daemon transcribe-daemon hotkey-daemon
# Test: Press your hotkey, say "Hello world", releaseThat's it. You're dictating.
┌─────────────────────────────────────────────────────────────┐
│ Press Hotkey → Speak → Release Hotkey │
│ ↓ ↓ ↓ │
│ Start Recording Recording... Stop & Transcribe │
│ ↓ │
│ Text appears in active window │
└─────────────────────────────────────────────────────────────┘
Three background daemons:
- recording-daemon - Always recording to RAM buffer (zero-latency start)
- transcribe-daemon - AI model loaded and ready (fast transcription)
- hotkey-daemon - Listens for your hotkey via XDG Desktop Portal
Speed:
- Recording start: 0ms (already buffering in RAM)
- Recording stop: ~5ms (extract audio from buffer)
- Transcription: ~150ms for 3 seconds of speech (on modern CPU)
- ⚡ Fast - Zero-latency recording, near-instant transcription
- 🔒 Private - 100% local processing, no cloud
- 🎯 Accurate - Powered by Parakeet (NVIDIA NeMo)
- 📋 Auto-paste - Types text directly into active window
- 🖥️ Smart - Detects terminals, uses Ctrl+Shift+V vs Ctrl+V
- 🤖 LLM cleanup - Grammar/formatting via Groq API
- 🔧 Custom corrections - Fix common mistakes ("C plus plus" → "C++")
- 🛠️ Tool calling - Voice commands that run scripts
See FEATURES.md for details.
Operating System:
- Linux with Wayland (tested on Arch + Hyprland)
- Other compositors should work but are untested
Hardware:
- ~2GB RAM for AI model
- ~500MB disk space
- Any microphone
System Packages:
ffmpeg- Audio capturewl-clipboard(wl-copy) - Clipboard managementydotool- Keyboard simulation (needs root:sudo systemctl enable --now ydotool)
Install with:
# Arch
sudo pacman -S ffmpeg wl-clipboard ydotool
# Debian/Ubuntu
sudo apt install ffmpeg wl-clipboard ydotoolClipboard History Protection:
This app uses the --sensitive flag when copying dictated text to the clipboard. If you're running a clipboard manager like cliphist (v2.0+), dictations are automatically excluded from your clipboard history.
Why this matters:
- Dictations might contain passwords, personal info, or sensitive content
- The sensitive flag prevents this content from being stored in clipboard history
- Your dictated text appears in your target app but doesn't pollute your clipboard manager
Compatibility:
- Works with cliphist v2.0+
- Other clipboard managers may respect this flag (check their docs)
- If your clipboard manager doesn't support
--sensitive, dictations will appear in history (but still work normally)
Config lives at ~/.config/transcribe-rs/config.toml (created by installer).
Change microphone:
# List available mics
pactl list sources short
# Edit config or systemd service
nano ~/.config/systemd/user/recording-daemon.service
# Set: Environment="RECORDING_MICROPHONE=your-device-name"
systemctl --user restart recording-daemonChange hotkey: The hotkey-daemon uses XDG Desktop Portal. Set your compositor's global shortcut to trigger it.
Advanced config: See docs/ folder.
Nothing happens when I press hotkey:
# Check daemons are running
systemctl --user status recording-daemon transcribe-daemon hotkey-daemon
# Check logs
journalctl --user -u hotkey-daemon -f"Failed to connect to daemon":
# Restart daemons
systemctl --user restart recording-daemon transcribe-daemon hotkey-daemonPaste not working:
# Check ydotool is running with root
sudo systemctl status ydotool
# Start if needed
sudo systemctl enable --now ydotoolOther issues:
Run ./install.sh again - it checks all dependencies.
Transcription Speed (Parakeet int8 quantized):
- M4 Max: 30x real-time
- Ryzen 5700X: 20x real-time
- Intel i5-6500: 5x real-time
(A 3-second recording transcribes in ~150ms on modern hardware)
Latency Breakdown:
| Stage | Time |
|---|---|
| Hotkey press → recording starts | 0ms (already buffering) |
| Recording stops → audio extracted | ~5ms |
| Audio extraction → transcription | ~150ms |
| Transcription → paste | ~50ms |
| Total | ~200ms |
Q: Does this work on X11? A: Not currently - uses Wayland-specific tools (wl-clipboard). PRs welcome!
Q: Does it work on Sway/other compositors?
A: Should work! Main functionality is compositor-agnostic. Terminal detection uses Hyprland's hyprctl.
Q: Why not use cloud APIs (OpenAI, Google, etc.)? A: Privacy, cost, latency. Local is instant, free, and your voice stays on your machine.
Contributions welcome!
See CONTRIBUTING.md.
- Ilya Stupakov - Original transcribe-rs library
- NVIDIA - Parakeet model
- istupakov -
MIT License - See LICENSE