A high-performance text-to-speech plugin for LiveKit agents using Kokoro TTS with ultra-low latency streaming implementation for real-time voice synthesis.
- Ultra-Low Latency: ~80ms time-to-first-byte (TTFB) on RTX 4090
- Multiple Voices: Support for multiple voices and voice mixing
- Streaming Support: Real-time audio generation with chunked streaming
- LiveKit Integration: Seamless integration with LiveKit agents framework
- LiveKit Agents v1.0 or higher
- Kokoro FastAPI server instance
- NVIDIA GPU (recommended for optimal performance)
- Python 3.8+
| Hardware | Latency | Quality | Use Case |
|---|---|---|---|
| RTX 4090 | ~80ms TTFB | High | Real-time applications |
- Clone or download this plugin into your LiveKit-based agents project root directory
- Set up the Kokoro FastAPI server for model inference
- Install required dependencies:
pip install openai httpx
Use the Kokoro FastAPI server for optimized inference:
Repository: remsky/Kokoro-FastAPI
This server provides OpenAI-compatible endpoints with optimized Kokoro TTS inference for ultra-low latency performance.
Initialize your agent session with the KokoroTTS plugin:
from kokoro_plugin import KokoroTTS
session = AgentSession(
# ... other configuration
tts=KokoroTTS(
base_url="http://localhost:8000",
api_key="NULL",
voice="af_heart",
speed=1.0
)
)