TTS Adapter

Universal text-to-speech adapter with pluggable engines.

Quick Start

# Copy config
cp .env.example .env

# Build image
make build

# Start server (model downloads on first run, ~4.3GB)
make up

# Wait for health (model warmup ~2min)
make health

# Test generation
make test

Local Development (without Docker)

uv sync
make serve

Offline Mode

Download model while online, then run without network:

make download-model     # Downloads to ~/.cache/tts-adapter/models/
# Add to .env:
# TTS_QWEN3_MODEL_PATH=~/.cache/tts-adapter/models/Qwen3-TTS-12Hz-1.7B-CustomVoice
# HF_HUB_OFFLINE=1

See Qwen3 Engine docs for details.

Web UI

Open http://localhost:9880 — three modes (Simple, Voice Design, Voice Clone), RU/EN switch, advanced generation settings. See Web UI docs.

For API access, see API Reference. Swagger docs available at /docs.

Configuration

Edit .env (copy from .env.example):

# Engine selection
TTS_ENGINE=qwen3

# Defaults
TTS_DEFAULT_SPEAKER=Serena
TTS_DEFAULT_LANGUAGE=Russian

# Qwen3 engine
TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
TTS_QWEN3_DEVICE=cuda:0
TTS_QWEN3_DTYPE=bfloat16

See .env.example for all options.

Docker

make build    # Build image
make up       # Start container
make health   # Check health
make logs     # View logs
make down     # Stop
make shell    # Shell into container

See Makefile for all targets.

Model Cache

Model weights (~4.3GB) download automatically on first make up. Stored in host's ~/.cache/huggingface and mounted into container, so:

Download happens once, persists across container restarts
Same cache shared between Docker and local development
Custom path: set HF_CACHE_PATH in .env

API Usage

Single Generation

curl -X POST http://localhost:9880/tts \
  -H 'content-type: application/json' \
  -d '{"text":"Hello world","language":"English","speaker":"Ryan"}' \
  --output out.wav

Batch Generation

curl -X POST http://localhost:9880/tts/batch \
  -H 'content-type: application/json' \
  -d '{"items":[{"id":"001","text":"First phrase"},{"id":"002","text":"Second phrase"}]}' \
  --output batch.zip

Voice Cloning

Clone any voice from a 3-10 second audio sample (requires Base model):

# Switch to Base model in .env:
# TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-Base

curl -X POST http://localhost:9880/tts/clone \
  -F 'text=Привет мир' \
  -F 'language=Russian' \
  -F 'reference_audio=@voice_sample.wav' \
  -F 'reference_text=Текст из референсного аудио' \
  --output cloned.wav

See Qwen3 Engine docs for details.

Health Check

curl http://localhost:9880/health

Documentation

Document	Description
Web UI	Language switch, advanced settings, multi-client, LAN
Architecture	Design decisions, engine protocol
API Reference	Endpoint specs, request/response formats
Qwen3 Engine	Model variants, speakers, setup
AGENTS.md	Project instructions for AI agents

Engines

Engine	Status	Description
Qwen3-TTS	✅ Ready	1.7B/0.6B with voice cloning, preset speakers, instructions

Adding New Engines

Create tts_adapter/engines/new_engine.py
Implement TTSEngine protocol
Register in engines/__init__.py
Document in docs/engines/

See Architecture for details.

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs		docs
examples		examples
scripts/qwen3		scripts/qwen3
tests		tests
todo		todo
tts_adapter		tts_adapter
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compose.yml		compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTS Adapter

Table of Contents

Quick Start

Local Development (without Docker)

Offline Mode

Web UI

Configuration

Docker

Model Cache

API Usage

Single Generation

Batch Generation

Voice Cloning

Health Check

Documentation

Engines

Adding New Engines

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TTS Adapter

Table of Contents

Quick Start

Local Development (without Docker)

Offline Mode

Web UI

Configuration

Docker

Model Cache

API Usage

Single Generation

Batch Generation

Voice Cloning

Health Check

Documentation

Engines

Adding New Engines

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages