Skip to content

Latest commit

 

History

History
199 lines (154 loc) · 5.63 KB

File metadata and controls

199 lines (154 loc) · 5.63 KB

API Reference

Base URL: http://localhost:9880

Web UI

Browser interface at http://localhost:9880/. See Web UI docs for features, i18n, advanced settings, LAN setup.


API Endpoints

GET /health

Health check with model info.

Response:

{
  "ok": true,
  "engine": "qwen3",
  "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
  "device": "cuda:0",
  "supports_cloning": false,
  "supports_design": false
}
Field Description
ok Always true if server is healthy
engine Engine name (e.g., "qwen3")
model Loaded model ID
device Device (e.g., "cuda:0", "cpu")
supports_cloning Whether /tts/clone is available
supports_design Whether /tts/design is available

POST /tts

Generate WAV audio from text.

Request:

{
  "text": "Привет мир",
  "language": "Russian",
  "speaker": "Ryan",
  "instruct": "Calm, friendly"
}
Field Type Default Description
text string required Text to synthesize
language string "Auto" Language code or "Auto"
speaker string "default" Speaker/voice name
instruct string "" Style instruction
generation object {} Optional generation parameters (see below)

Generation parameters (all optional, null = library default):

Field Type Default Range Description
temperature float 0.9 0.01-2.0 Sampling randomness
top_k int 50 1-200 Top-k sampling
top_p float 1.0 0.1-1.0 Nucleus sampling
repetition_penalty float 1.05 1.0-2.0 Repetition penalty
max_new_tokens int 2048 256-4096 Max codec tokens

See Generation Parameters for tuning tips.

Response: audio/wav binary

Example:

curl -X POST http://localhost:9880/tts \
  -H 'content-type: application/json' \
  -d '{"text":"Привет мир","language":"Russian","generation":{"temperature":0.7}}' \
  --output out.wav

POST /tts/batch

Generate multiple WAV files as ZIP archive.

Note: All items must share the same language, speaker, and instruct params (batch params are global).

Request:

{
  "items": [
    {"id": "001", "text": "First phrase", "language": "Russian"},
    {"id": "002", "text": "Second phrase", "language": "Russian"}
  ]
}
Field Type Description
items array List of TTS items (must be non-empty)
items[].id string Unique ID (used as filename, sanitized)
items[].text string Text to synthesize
items[].language string Language code (must match across items)
items[].speaker string Speaker name (must match across items)
items[].instruct string Style instruction (must match across items)

Response: application/zip containing {id}.wav files

Example:

curl -X POST http://localhost:9880/tts/batch \
  -H 'content-type: application/json' \
  -d '{"items":[{"id":"001","text":"Привет"},{"id":"002","text":"Мир"}]}' \
  --output batch.zip

unzip batch.zip -d output/

POST /tts/design

Generate speech with a custom voice created from text description.

Requires: VoiceDesign model (TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign)

Request: multipart/form-data

Field Type Default Description
text string required Text to synthesize
instruct string required Voice description (see Voice Design Guide)
language string "Auto" Language code
temperature float null Sampling temperature (0.01-2.0)
top_k int null Top-k sampling (1-200)
top_p float null Nucleus sampling (0.1-1.0)
repetition_penalty float null Repetition penalty (1.0-2.0)
max_new_tokens int null Max codec tokens (256-4096)

Response: audio/wav binary

Example:

curl -X POST http://localhost:9880/tts/design \
  -F 'text=Привет мир' \
  -F 'language=Russian' \
  -F 'instruct=Adult female voice, contralto range, warm and confident' \
  -F 'temperature=0.7' \
  --output designed.wav

POST /tts/clone

Clone a voice from reference audio sample.

Requires: Base model (TTS_QWEN3_MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-Base)

Request: multipart/form-data

Field Type Default Description
text string required Text to synthesize
reference_audio file required WAV file (3-10 seconds)
reference_text string "" Transcript of reference audio (improves quality)
language string "Auto" Language code
temperature float null Sampling temperature (0.01-2.0)
top_k int null Top-k sampling (1-200)
top_p float null Nucleus sampling (0.1-1.0)
repetition_penalty float null Repetition penalty (1.0-2.0)
max_new_tokens int null Max codec tokens (256-4096)

Response: audio/wav binary

Example:

curl -X POST http://localhost:9880/tts/clone \
  -F 'text=Привет мир' \
  -F 'language=Russian' \
  -F 'reference_audio=@voice_sample.wav' \
  -F 'reference_text=Hello world' \
  -F 'temperature=0.7' \
  --output cloned.wav

Error Responses

Standard HTTP error codes with JSON body:

{
  "detail": "Error message"
}
Code Description
400 Invalid request or unsupported feature
422 Validation error (missing/invalid fields or out-of-range generation params)
500 Internal server error