Real-time speech-to-text combining Go client with SimulStreaming Python backend.
ghisper splits speech recognition into two parts:
- Go client: Audio capture, typing automation, system integration
- Python backend: SimulStreaming ASR engine with AlignAtt policy
- Real-time transcription streaming
- Multi-language support (100+ languages via Whisper)
- Progressive typing via uinputd-go
- Low latency (~200-500ms)
- Unix socket IPC for minimal overhead
- Interactive installation with GPU detection
Go Client (ghisper)
- Audio capture (malgo)
- Unix socket client
- Progressive typing (uinputd-go)
|
v Unix Socket
Python Backend (user systemd service)
- SimulStreaming ASR
- Whisper (tiny → large-v3-turbo) + AlignAtt
- HuggingFace model browser
# System packages
sudo pacman -S python git
# Go 1.23+
# Python 3.10+# Build Go client
make build
# Install to ~/.local/bin (user-local, no sudo)
make install
# Install Python backend (interactive)
ghisper install backend
# - Detects GPU (NVIDIA/AMD/none)
# - Choose PyTorch variant (CPU/CUDA/ROCm)
# - Select Whisper model (tiny → large-v3-turbo)
# - Creates venv in ~/.local/share/ghisper/venv
# - Generates config at ~/.config/ghisper/config.toml
# Install and start systemd service
ghisper install systemd-service# Check system status
ghisper status
# Start recording (press 'r' or Space to toggle)
ghisper record
# Stop all sessions
ghisper stop
# Run health checks
ghisper doctorConfig: ~/.config/ghisper/config.toml
[server]
type = "unix"
socket_path = "/tmp/ghisper.sock"
[model]
name = "base"
device = "auto" # auto, cpu, cuda, rocm
[processing]
language = "" # auto-detect
task = "transcribe"
[client.audio]
device = "default"
chunk_size_ms = 100
[client.typing]
enabled = true
layout = "us"
progressive = true
[logging]
level = "info"make build # Build to bin/ghisper
make install # Install to ~/.local/bin
make uninstall # Remove binary and backend
make purge # Full cleanup (config + models)
make check # Format, vet, testghisper/
├── cmd/ghisper/ # CLI commands
├── internal/
│ ├── audio/ # Audio capture (malgo)
│ ├── client/ # Backend client
│ ├── typer/ # Typing (uinputd-go)
│ ├── config/ # Config management
│ ├── models/ # Model registry
│ ├── protocol/ # IPC protocol
│ ├── installer/ # Installation logic
│ └── doctor/ # Health checks
└── backend/ # Python backend
├── server.py # SimulStreaming server
├── config.py # Config parser
└── convert_model.py # HF → Whisper converter
- github.com/gen2brain/malgo - Audio capture
- github.com/bnema/uinputd-go - Keyboard typing
- github.com/spf13/cobra - CLI framework
- github.com/charmbracelet/* - Terminal UI
- torch - Deep learning backend
- openai-whisper - ASR model
- SimulStreaming - Streaming ASR
- go-huggingface - Model downloads
MIT
- SimulStreaming: https://github.com/ufal/SimulStreaming
- uinputd-go: https://github.com/bnema/uinputd-go
- voxd: https://github.com/jakov-nordic/voxd