Local speech-to-text with LLM correction for Linux. Speak naturally, get polished text typed into any application.
- Voice Activity Detection - Automatically detects when you start/stop speaking
- Speech-to-Text - Local Whisper model for accurate transcription
- LLM Text Correction - Fixes homophones, adds punctuation, removes filler words
- Floating Overlay - Transparent window with real-time waveform visualization
- Universal Input - Types into any application via xdotool
- 100% Local - No cloud services, all processing on-device
- GPU Accelerated - Optional CUDA support for faster transcription
- Linux with X11 (Wayland not yet supported)
- 4GB RAM minimum, 8GB recommended
- ~2GB disk for models
# Check your session type
echo $XDG_SESSION_TYPE # Should output: x11# Ubuntu/Debian
sudo apt install clang libclang-dev libasound2-dev xdotool libayatana-appindicator3-dev nodejs npm
# Arch Linux
sudo pacman -S clang alsa-lib xdotool libayatana-appindicator nodejs npm
# Fedora
sudo dnf install clang clang-devel alsa-lib-devel xdotool libayatana-appindicator-gtk3-devel nodejs npmgit clone https://github.com/andynu/yammer.git
cd yammer
cargo build --releaseFirst build takes 10-15 minutes (compiles Whisper and llama.cpp).
cargo run --release --bin yammer download-modelsDownloads Whisper (~141MB) and TinyLlama (~1.6GB) to ~/.cache/yammer/models/.
# GUI mode
cd yammer-app && npm install && npm run tauri dev
# CLI mode (for testing)
cargo run --release --bin yammer dictateThe floating overlay shows:
- Microphone status and state indicator
- Real-time waveform visualization
- Transcribed and corrected text
Press the global hotkey (configurable) to start/stop dictation. Text is automatically typed into the focused application.
# Live dictation
yammer dictate
# List audio devices
yammer list-devices
# Record audio
yammer record --duration 5 --output test.wav
# Transcribe a file
yammer transcribe test.wav
# Test voice activity detection
yammer vad-test --duration 30
# Correct text with LLM
yammer correct "your transcribed text here"yammer/
├── yammer-core/ # Shared types, config, model management
├── yammer-audio/ # Audio capture, VAD, resampling
├── yammer-stt/ # Whisper speech-to-text
├── yammer-llm/ # LLM text correction
├── yammer-output/ # Text output via xdotool
├── yammer-cli/ # CLI interface
└── yammer-app/ # Tauri desktop app
Microphone → Audio Capture → VAD → Resampler (16kHz)
↓
Whisper → LLM Correction → xdotool → Active Window
↓
Tauri UI (waveform, status, text)
VAD sensitivity can be adjusted:
# More sensitive (quiet speech)
yammer dictate --threshold 0.005
# Less sensitive (loud/clear speech only)
yammer dictate --threshold 0.02| Component | CPU | GPU (CUDA) |
|---|---|---|
| Whisper (5s audio) | 500ms-1.5s | 250-750ms |
| LLM correction | 100-500ms | - |
| VAD latency | ~200ms start, ~400ms end | - |
Build fails with "stdbool.h not found"
sudo apt install clang libclang-dev"No audio devices found"
sudo apt install libasound2-dev
cargo clean && cargo build --releaseWindow not transparent Requires X11 with a compositor (standard on GNOME/KDE).
Transcription slow
Use --release builds. Consider enabling CUDA if you have an NVIDIA GPU.
MIT - See LICENSE and THIRD_PARTY_NOTICES.md for details.
- whisper.cpp - Speech recognition
- llama.cpp - LLM inference
- Tauri - Desktop app framework
- cpal - Audio I/O
