Skip to content

6r0m/tts-adapter

Repository files navigation

TTS Adapter

Universal text-to-speech adapter with pluggable engines.

Table of Contents

Quick Start

# Copy config
cp .env.example .env

# Build image
make build

# Start server (model downloads on first run, ~4.3GB)
make up

# Wait for health (model warmup ~2min)
make health

# Test generation
make test

Local Development (without Docker)

uv sync
make serve

Offline Mode

Download model while online, then run without network:

make download-model     # Downloads to ~/.cache/tts-adapter/models/
# Add to .env:
# TTS_QWEN3_MODEL_PATH=~/.cache/tts-adapter/models/Qwen3-TTS-12Hz-1.7B-CustomVoice
# HF_HUB_OFFLINE=1

See Qwen3 Engine docs for details.

Web UI

Open http://localhost:9880 — three modes (Simple, Voice Design, Voice Clone), RU/EN switch, advanced generation settings. See Web UI docs.

For API access, see API Reference. Swagger docs available at /docs.

Configuration

Edit .env (copy from .env.example):

# Engine selection
TTS_ENGINE=qwen3

# Defaults
TTS_DEFAULT_SPEAKER=Serena
TTS_DEFAULT_LANGUAGE=Russian

# Qwen3 engine
TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
TTS_QWEN3_DEVICE=cuda:0
TTS_QWEN3_DTYPE=bfloat16

See .env.example for all options.

Docker

make build    # Build image
make up       # Start container
make health   # Check health
make logs     # View logs
make down     # Stop
make shell    # Shell into container

See Makefile for all targets.

Model Cache

Model weights (~4.3GB) download automatically on first make up. Stored in host's ~/.cache/huggingface and mounted into container, so:

  • Download happens once, persists across container restarts
  • Same cache shared between Docker and local development
  • Custom path: set HF_CACHE_PATH in .env

API Usage

Single Generation

curl -X POST http://localhost:9880/tts \
  -H 'content-type: application/json' \
  -d '{"text":"Hello world","language":"English","speaker":"Ryan"}' \
  --output out.wav

Batch Generation

curl -X POST http://localhost:9880/tts/batch \
  -H 'content-type: application/json' \
  -d '{"items":[{"id":"001","text":"First phrase"},{"id":"002","text":"Second phrase"}]}' \
  --output batch.zip

Voice Cloning

Clone any voice from a 3-10 second audio sample (requires Base model):

# Switch to Base model in .env:
# TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-Base

curl -X POST http://localhost:9880/tts/clone \
  -F 'text=Привет мир' \
  -F 'language=Russian' \
  -F 'reference_audio=@voice_sample.wav' \
  -F 'reference_text=Текст из референсного аудио' \
  --output cloned.wav

See Qwen3 Engine docs for details.

Health Check

curl http://localhost:9880/health

Documentation

Document Description
Web UI Language switch, advanced settings, multi-client, LAN
Architecture Design decisions, engine protocol
API Reference Endpoint specs, request/response formats
Qwen3 Engine Model variants, speakers, setup
AGENTS.md Project instructions for AI agents

Engines

Engine Status Description
Qwen3-TTS ✅ Ready 1.7B/0.6B with voice cloning, preset speakers, instructions

Adding New Engines

  1. Create tts_adapter/engines/new_engine.py
  2. Implement TTSEngine protocol
  3. Register in engines/__init__.py
  4. Document in docs/engines/

See Architecture for details.

License

Apache-2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors