Rust implementation of the Kokoro text-to-speech model. Small model (87M parameters), high quality output, very fast inference.
- Multi-language: English, Chinese, Japanese, Spanish, French, and more via espeak-ng
- Voice style mixing (e.g.,
af_sky.4+af_nicole.5) - OpenAI-compatible API server
- Streaming and pipe modes for LLM integration
- Automatic language detection
# Install (macOS)
brew install byteowlz/tap/koko
# Or download from GitHub Releases
# https://github.com/byteowlz/kokorox/releases
# Generate speech
koko text "Hello, this is a test"
# Output: tmp/output.wavDownload from GitHub Releases for Linux, macOS, and Windows.
Requires ONNX runtime and espeak-ng:
# macOS
brew install espeak-ng
# Ubuntu/Debian
sudo apt-get install espeak-ng libespeak-ng-devBuild:
git clone https://github.com/byteowlz/kokorox.git
cd kokorox
pip install -r scripts/requirements.txt
python scripts/download_voices.py --all
cargo build --releasetar -xzf onnxruntime-linux-x64-gpu-1.22.0.tgz
sudo cp -a onnxruntime-linux-x64-gpu-1.22.0/include /usr/local/
sudo cp -a onnxruntime-linux-x64-gpu-1.22.0/lib /usr/local/
sudo ldconfig
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATHkoko text "Hello, world!" -o greeting.wav
koko file poem.txt # One wav per linekoko text "Hola, mundo!" --lan es
koko text "你好,世界!" --lan zh
koko -a text "Bonjour!" # Auto-detect languagekoko voices # List available voices
koko voices --language en --gender female # Filter voices
koko text "Hello" --style af_sky
koko text "Hello" --style af_sky.4+af_nicole.5 # Mix stylesollama run llama3 "Tell me a story" | koko pipe
ollama run llama3 "Explain physics" | koko pipe --silent -o output.wavkoko openai --ip 0.0.0.0 --port 3000curl -X POST http://localhost:3000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model": "kokoro", "input": "Hello!", "voice": "af_sky"}' \
-o hello.wav
curl http://localhost:3000/v1/audio/voices # List voice IDs
curl http://localhost:3000/v1/audio/voices/detailed # Voice metadatakoko stream > output.wav
# Type text, press Enter. Ctrl+D to exit.docker build -t kokorox .
docker run -v ./tmp:/app/tmp kokorox text "Hello from docker!" -o tmp/hello.wav
docker run -p 3000:3000 kokorox openai --ip 0.0.0.0 --port 3000koko text "Text here" --verbose # Detailed processing logs
koko text "Accénted" --debug-accents # Character-by-character analysisThe default installation includes standard voices. More voices (54 total across 8 languages) can be converted from Hugging Face:
python scripts/convert_pt_voices.py --all
koko -d data/voices-custom.bin text "Hello" --style en_sarahGPL 3.0 due to use of the espeak-rs-sys crate which statically links espeak-ng