A high-performance speech-to-text plugin for LiveKit agents using OpenAI Whisper with faster-whisper implementation for accurate and efficient speech recognition.
- Faster-Whisper Implementation: Optimized inference using faster-whisper for improved performance
- High Accuracy: State-of-the-art speech recognition using OpenAI Whisper models
- Local Processing: On-device inference with no external API dependencies
- Multi-Language Support: Support for 90+ languages with configurable language detection
- Warmup Support: Optional model warmup for consistent performance
- LiveKit Integration: Seamless integration with LiveKit agents framework
- LiveKit Agents v1.2 or higher
- NVIDIA GPU (recommended for optimal performance)
- Python 3.8+
- faster-whisper library
| Model | Hardware | Latency | Use Case |
|---|---|---|---|
| Large-v3-Turbo | RTX 4090 | <180ms | Real-time applications |
- Clone or download this plugin into your LiveKit-based agents project root directory
- Install required dependencies:
pip install faster-whisper soundfile numpy
- Ensure you have adequate storage for model downloads (models are cached locally)
Initialize your agent session with the WhisperSTT plugin:
from whisper_plugin import WhisperSTT
session = AgentSession(
# ... other configuration
stt=WhisperSTT(
model="deepdml/faster-whisper-large-v3-turbo-ct2",
language="en",
device="cuda",
compute_type="float16",
)
)The plugin supports 90+ languages. Common language codes:
# English
stt = WhisperSTT(language="en")
# Spanish
stt = WhisperSTT(language="es")
# French
stt = WhisperSTT(language="fr")
# German
stt = WhisperSTT(language="de")
# Japanese
stt = WhisperSTT(language="ja")
# Auto-detect language
stt = WhisperSTT(language=None) # Will auto-detect# GPU acceleration (recommended)
stt = WhisperSTT(device="cuda", compute_type="float16")
# CPU processing
stt = WhisperSTT(device="cpu", compute_type="float32")
# Auto-select best device
stt = WhisperSTT(device="auto")# Enable warmup for consistent performance
stt = WhisperSTT(
warmup_audio="./sample_audio.wav", # 5-10 second audio clip
device="cuda"
)