Skip to content

Real-time desktop audio transcription using OpenAI Whisper for Arch Linux with CUDA acceleration

Notifications You must be signed in to change notification settings

CGAlei/FasterWhisper

Repository files navigation

FasterWhisper Real-Time Transcription

Real-time desktop audio transcription using OpenAI Whisper for Arch Linux.

Features

  • Real-time desktop audio capture from any application
  • High-accuracy Spanish transcription using Whisper large-v3
  • CUDA GPU acceleration for fast processing
  • Automatic clipboard integration
  • Word-by-word progressive display
  • WebSocket server-client architecture

Quick Start

  1. Clone the repository:
git clone https://github.com/yourusername/FasterWhisper.git
cd FasterWhisper
  1. Run the setup script:
chmod +x setup_arch.sh
./setup_arch.sh
  1. Start transcription:
trs

System Requirements

  • OS: Arch Linux
  • GPU: NVIDIA GPU with CUDA support (recommended)
  • Audio: PulseAudio or PipeWire
  • RAM: 8GB minimum (for large-v3 model)

Dependencies

The setup script will install:

  • CUDA toolkit and NVIDIA drivers
  • PipeWire/PulseAudio support
  • Python conda environment with all required packages
  • System utilities (clipboard, notifications)

Configuration

Copy .env.example to .env and configure:

cp .env.example .env
# Edit .env with your OpenRouter API key (optional for LLM refinement)

Main configuration in config.json:

  • Audio source selection
  • Whisper model settings
  • Language preferences
  • Processing intervals

Usage

# Start transcription (auto-starts server)
trs

# Get help
trs --help

# Manual server start
conda activate trs
python whisper_server.py

# Manual client start (separate terminal)
python trs_client.py

Troubleshooting

Audio issues:

# List audio sources
pactl list sources

# Test audio capture
trs --test

CUDA issues:

# Check NVIDIA driver
nvidia-smi

# Test CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

Environment issues:

# Recreate conda environment
conda env remove -n trs
conda env create -f environment.yml

Architecture

  • whisper_server.py - Main transcription server with WebSocket
  • trs_client.py - Display client with clipboard integration
  • audio_capture.py - PulseAudio desktop audio capture
  • utils.py - Configuration and utilities
  • trs - Global command wrapper script

License

MIT License - See LICENSE file for details.

About

Real-time desktop audio transcription using OpenAI Whisper for Arch Linux with CUDA acceleration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •