Skip to content

Implementation of offline text-to-speech (TTS) for macos

Notifications You must be signed in to change notification settings

jashdubal/offline-tts

Repository files navigation

Offline TTS

CLI tool to generate text-to-speech for raw text or documents. Uses hexgrad/Kokoro-82M under the hood.

Quick Setup (macOS)

For a completely automated setup on macOS, just run:

bin/setup-macos

This script will automatically install and configure:

  • ✅ Xcode Command Line Tools (if needed)
  • ✅ Homebrew (if needed)
  • ✅ Python 3
  • ✅ UV package manager
  • ✅ espeak-ng for TTS fallback
  • ✅ Kokoro TTS model and dependencies
  • ✅ Virtual environment setup

After running the setup script, you're ready to use the TTS tool immediately!

Manual Installation

If you prefer manual installation or are on a different platform:

Requirements

Having python3 installed.

Using UV

UV is a modern python package and venv manager. You don't have to use it but if you do don't forgot to set it up properly:

uv init
source .venv/bin/activate && python -m ensurepip --upgrade

Install kokoro TTS model

Using pip

pip install -q kokoro>=0.3.4 soundfile

Using UV

uv add kokoro soundfile

Install espeak, used for English OOD fallback and some non-English languages (Linux/Windows)

# Mac
brew install espeak-ng
# Linux
apt-get -qq -y install espeak-ng > /dev/null 2>&1

Usage

Use --silent for completely quiet operation.

Using the executable script (recommended):

# Raw text with default settings
bin/tts "living the dream"

# Raw text with custom voice and speed
bin/tts "living the dream" -s 1.2 -v af_bella

# Document with GPU acceleration and custom output format
bin/tts -f README.md --mps --format wav -o my_audio

# Custom filename (will not overwrite existing files)
bin/tts "hello world" --filename "my_greeting"

# Play audio immediately after generation (uses integrated player)
bin/tts "hello world" --play

# Preview audio without saving (temporary playback only)
bin/tts "hello world" --play-only

# Silent mode - no output except errors
bin/tts "hello world" --silent

# Generate audio and play later with standalone player
bin/tts "hello world" --filename "my_audio"
bin/play --latest

# All options example
bin/tts "hello world" -s 0.8 -v af_heart --mps --format mp3 -o outputs --filename "custom_audio" --play

Command line options:

  • --mps: Enable Mac OS MPS GPU acceleration (replaces manual PYTORCH_ENABLE_MPS_FALLBACK=1)
  • --format: Output format - mp3 (default) or wav
  • -o, --output: Output directory (default: outputs)
  • --filename: Custom filename for output (without extension). Will not overwrite existing files.
  • --play: Automatically play the generated audio file after creation (supports macOS, Linux, Windows)
  • --play-only: Generate and play audio without saving to output directory (temporary preview only)
  • --silent: Silent mode - suppress all output except errors (perfect for scripts)
  • -s, --speed: Speech speed (default: 1.0)
  • -v, --voice: Voice to use (default: af_heart)
  • -f, --source: Path to source document file instead of raw text

Traditional usage (if not using the executable script):

# With python3
python3 cli.py "living the dream" -s 1 -v af_bella --mps
# With UV
uv run cli.py "living the dream" -s 1 -v af_bella --mps
# Source file example
uv run cli.py -f README.md --mps --format wav --filename "readme_audio"

On first run you will have to download the weights which will take some time:

kokoro-v1_0.pth: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 327M/327M [01:51<00:00, 2.94MB/s]

Audio Playback

The project includes a standalone audio player for playing generated TTS files or any other audio files.

Using the audio player:

# Play a specific audio file
bin/play path/to/audio.mp3

# Play the latest generated audio file
bin/play --latest

# List all audio files in the outputs directory
bin/play --list -d outputs

# Play all audio files in a directory
bin/play -d outputs

# Verbose output
bin/play --latest -v

Audio player options:

  • file: Path to specific audio file to play
  • -d, --directory: Play all audio files in a directory
  • -l, --list: List audio files in current or specified directory
  • --latest: Play the most recently created audio file in outputs directory
  • -v, --verbose: Show detailed output during playback

Voices

For documentation on voices see VOICES.md

About

Implementation of offline text-to-speech (TTS) for macos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •