API Reference

Base URL: http://localhost:9880

Web UI

Browser interface at http://localhost:9880/. See Web UI docs for features, i18n, advanced settings, LAN setup.

API Endpoints

GET /health

Health check with model info.

Response:

{
  "ok": true,
  "engine": "qwen3",
  "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
  "device": "cuda:0",
  "supports_cloning": false,
  "supports_design": false
}

Field	Description
ok	Always true if server is healthy
engine	Engine name (e.g., "qwen3")
model	Loaded model ID
device	Device (e.g., "cuda:0", "cpu")
supports_cloning	Whether `/tts/clone` is available
supports_design	Whether `/tts/design` is available

POST /tts

Generate WAV audio from text.

Request:

{
  "text": "Привет мир",
  "language": "Russian",
  "speaker": "Ryan",
  "instruct": "Calm, friendly"
}

Field	Type	Default	Description
text	string	required	Text to synthesize
language	string	"Auto"	Language code or "Auto"
speaker	string	"default"	Speaker/voice name
instruct	string	""	Style instruction
generation	object	{}	Optional generation parameters (see below)

Generation parameters (all optional, null = library default):

Field	Type	Default	Range	Description
temperature	float	0.9	0.01-2.0	Sampling randomness
top_k	int	50	1-200	Top-k sampling
top_p	float	1.0	0.1-1.0	Nucleus sampling
repetition_penalty	float	1.05	1.0-2.0	Repetition penalty
max_new_tokens	int	2048	256-4096	Max codec tokens

See Generation Parameters for tuning tips.

Response: audio/wav binary

Example:

curl -X POST http://localhost:9880/tts \
  -H 'content-type: application/json' \
  -d '{"text":"Привет мир","language":"Russian","generation":{"temperature":0.7}}' \
  --output out.wav

POST /tts/batch

Generate multiple WAV files as ZIP archive.

Note: All items must share the same language, speaker, and instruct params (batch params are global).

Request:

{
  "items": [
    {"id": "001", "text": "First phrase", "language": "Russian"},
    {"id": "002", "text": "Second phrase", "language": "Russian"}
  ]
}

Field	Type	Description
items	array	List of TTS items (must be non-empty)
items[].id	string	Unique ID (used as filename, sanitized)
items[].text	string	Text to synthesize
items[].language	string	Language code (must match across items)
items[].speaker	string	Speaker name (must match across items)
items[].instruct	string	Style instruction (must match across items)

Response: application/zip containing {id}.wav files

Example:

curl -X POST http://localhost:9880/tts/batch \
  -H 'content-type: application/json' \
  -d '{"items":[{"id":"001","text":"Привет"},{"id":"002","text":"Мир"}]}' \
  --output batch.zip

unzip batch.zip -d output/

POST /tts/design

Generate speech with a custom voice created from text description.

Requires: VoiceDesign model (TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign)

Request: multipart/form-data

Field	Type	Default	Description
text	string	required	Text to synthesize
instruct	string	required	Voice description (see Voice Design Guide)
language	string	"Auto"	Language code
temperature	float	null	Sampling temperature (0.01-2.0)
top_k	int	null	Top-k sampling (1-200)
top_p	float	null	Nucleus sampling (0.1-1.0)
repetition_penalty	float	null	Repetition penalty (1.0-2.0)
max_new_tokens	int	null	Max codec tokens (256-4096)

Response: audio/wav binary

Example:

curl -X POST http://localhost:9880/tts/design \
  -F 'text=Привет мир' \
  -F 'language=Russian' \
  -F 'instruct=Adult female voice, contralto range, warm and confident' \
  -F 'temperature=0.7' \
  --output designed.wav

POST /tts/clone

Clone a voice from reference audio sample.

Requires: Base model (TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-Base)

Request: multipart/form-data

Field	Type	Default	Description
text	string	required	Text to synthesize
reference_audio	file	required	WAV file (3-10 seconds)
reference_text	string	""	Transcript of reference audio (improves quality)
language	string	"Auto"	Language code
temperature	float	null	Sampling temperature (0.01-2.0)
top_k	int	null	Top-k sampling (1-200)
top_p	float	null	Nucleus sampling (0.1-1.0)
repetition_penalty	float	null	Repetition penalty (1.0-2.0)
max_new_tokens	int	null	Max codec tokens (256-4096)

Response: audio/wav binary

Example:

curl -X POST http://localhost:9880/tts/clone \
  -F 'text=Привет мир' \
  -F 'language=Russian' \
  -F 'reference_audio=@voice_sample.wav' \
  -F 'reference_text=Hello world' \
  -F 'temperature=0.7' \
  --output cloned.wav

Error Responses

Standard HTTP error codes with JSON body:

{
  "detail": "Error message"
}

Code	Description
400	Invalid request or unsupported feature
422	Validation error (missing/invalid fields or out-of-range generation params)
500	Internal server error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Web UI

API Endpoints

GET /health

POST /tts

POST /tts/batch

POST /tts/design

POST /tts/clone

Error Responses

FilesExpand file tree

api-reference.md

Latest commit

History

api-reference.md

File metadata and controls

API Reference

Web UI

API Endpoints

GET /health

POST /tts

POST /tts/batch

POST /tts/design

POST /tts/clone

Error Responses