Real-time desktop audio transcription using OpenAI Whisper for Arch Linux.
- Real-time desktop audio capture from any application
- High-accuracy Spanish transcription using Whisper large-v3
- CUDA GPU acceleration for fast processing
- Automatic clipboard integration
- Word-by-word progressive display
- WebSocket server-client architecture
- Clone the repository:
git clone https://github.com/yourusername/FasterWhisper.git
cd FasterWhisper- Run the setup script:
chmod +x setup_arch.sh
./setup_arch.sh- Start transcription:
trs- OS: Arch Linux
- GPU: NVIDIA GPU with CUDA support (recommended)
- Audio: PulseAudio or PipeWire
- RAM: 8GB minimum (for large-v3 model)
The setup script will install:
- CUDA toolkit and NVIDIA drivers
- PipeWire/PulseAudio support
- Python conda environment with all required packages
- System utilities (clipboard, notifications)
Copy .env.example to .env and configure:
cp .env.example .env
# Edit .env with your OpenRouter API key (optional for LLM refinement)Main configuration in config.json:
- Audio source selection
- Whisper model settings
- Language preferences
- Processing intervals
# Start transcription (auto-starts server)
trs
# Get help
trs --help
# Manual server start
conda activate trs
python whisper_server.py
# Manual client start (separate terminal)
python trs_client.pyAudio issues:
# List audio sources
pactl list sources
# Test audio capture
trs --testCUDA issues:
# Check NVIDIA driver
nvidia-smi
# Test CUDA availability
python -c "import torch; print(torch.cuda.is_available())"Environment issues:
# Recreate conda environment
conda env remove -n trs
conda env create -f environment.ymlwhisper_server.py- Main transcription server with WebSockettrs_client.py- Display client with clipboard integrationaudio_capture.py- PulseAudio desktop audio captureutils.py- Configuration and utilitiestrs- Global command wrapper script
MIT License - See LICENSE file for details.