CLI tool to generate text-to-speech for raw text or documents. Uses hexgrad/Kokoro-82M under the hood.
For a completely automated setup on macOS, just run:
bin/setup-macosThis script will automatically install and configure:
- ✅ Xcode Command Line Tools (if needed)
- ✅ Homebrew (if needed)
- ✅ Python 3
- ✅ UV package manager
- ✅ espeak-ng for TTS fallback
- ✅ Kokoro TTS model and dependencies
- ✅ Virtual environment setup
After running the setup script, you're ready to use the TTS tool immediately!
If you prefer manual installation or are on a different platform:
Having python3 installed.
UV is a modern python package and venv manager. You don't have to use it but if you do don't forgot to set it up properly:
uv init
source .venv/bin/activate && python -m ensurepip --upgrade
pip install -q kokoro>=0.3.4 soundfile
uv add kokoro soundfile
# Mac
brew install espeak-ng
# Linux
apt-get -qq -y install espeak-ng > /dev/null 2>&1
Use --silent for completely quiet operation.
# Raw text with default settings
bin/tts "living the dream"
# Raw text with custom voice and speed
bin/tts "living the dream" -s 1.2 -v af_bella
# Document with GPU acceleration and custom output format
bin/tts -f README.md --mps --format wav -o my_audio
# Custom filename (will not overwrite existing files)
bin/tts "hello world" --filename "my_greeting"
# Play audio immediately after generation (uses integrated player)
bin/tts "hello world" --play
# Preview audio without saving (temporary playback only)
bin/tts "hello world" --play-only
# Silent mode - no output except errors
bin/tts "hello world" --silent
# Generate audio and play later with standalone player
bin/tts "hello world" --filename "my_audio"
bin/play --latest
# All options example
bin/tts "hello world" -s 0.8 -v af_heart --mps --format mp3 -o outputs --filename "custom_audio" --play--mps: Enable Mac OS MPS GPU acceleration (replaces manual PYTORCH_ENABLE_MPS_FALLBACK=1)--format: Output format - mp3 (default) or wav-o, --output: Output directory (default: outputs)--filename: Custom filename for output (without extension). Will not overwrite existing files.--play: Automatically play the generated audio file after creation (supports macOS, Linux, Windows)--play-only: Generate and play audio without saving to output directory (temporary preview only)--silent: Silent mode - suppress all output except errors (perfect for scripts)-s, --speed: Speech speed (default: 1.0)-v, --voice: Voice to use (default: af_heart)-f, --source: Path to source document file instead of raw text
# With python3
python3 cli.py "living the dream" -s 1 -v af_bella --mps
# With UV
uv run cli.py "living the dream" -s 1 -v af_bella --mps
# Source file example
uv run cli.py -f README.md --mps --format wav --filename "readme_audio"On first run you will have to download the weights which will take some time:
kokoro-v1_0.pth: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 327M/327M [01:51<00:00, 2.94MB/s]
The project includes a standalone audio player for playing generated TTS files or any other audio files.
# Play a specific audio file
bin/play path/to/audio.mp3
# Play the latest generated audio file
bin/play --latest
# List all audio files in the outputs directory
bin/play --list -d outputs
# Play all audio files in a directory
bin/play -d outputs
# Verbose output
bin/play --latest -vfile: Path to specific audio file to play-d, --directory: Play all audio files in a directory-l, --list: List audio files in current or specified directory--latest: Play the most recently created audio file in outputs directory-v, --verbose: Show detailed output during playback
For documentation on voices see VOICES.md