Qwen3-TTS Server

A FastAPI-based text-to-speech server powered by Qwen3-TTS, featuring voice cloning, voice conversion, and multi-language support.

Features

Voice cloning - Clone any voice from a short reference audio clip
Voice conversion - Transform the voice of existing audio files
Multi-language support - 10 languages: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
Automatic transcription - Uses Whisper for reference audio transcription
Speed control - Adjust speech speed via time-stretching
RunPod ready - Optimized for deployment on RunPod with network volume support

API Endpoints

Endpoint	Method	Description
`/base_tts/`	GET	TTS with default English voice
`/synthesize_speech/`	GET	TTS with specified voice
`/upload_audio/`	POST	Upload reference voice
`/change_voice/`	POST	Voice conversion

Endpoint Details

`GET /synthesize_speech/`

Generate speech from text using a specified voice.

Parameter	Type	Required	Description
`text`	string	Yes	Text to synthesize
`voice`	string	Yes	Voice label (filename prefix in `resources/`)
`speed`	float	No	Speech speed multiplier (default: 1.0)

`GET /base_tts/`

Generate speech using the default English voice.

Parameter	Type	Required	Description
`text`	string	Yes	Text to synthesize
`speed`	float	No	Speech speed multiplier (default: 1.0)

`POST /upload_audio/`

Upload an audio file to use as a reference voice.

Field	Type	Description
`audio_file_label`	string	Label/name for the voice
`file`	file	Audio file (wav, mp3, flac, ogg; max 5MB)

`POST /change_voice/`

Convert the voice of an existing audio file.

Field	Type	Description
`reference_speaker`	string	Voice label to convert to
`file`	file	Audio file to convert

Quick Start

Using Docker (Recommended)

# Build the image
docker build -t qwen3-tts_server .

# Run the container
docker run --gpus all -p 7860:7860 qwen3-tts_server

Local Development

# Create conda environment
conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts

# Install PyTorch with CUDA
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

# Install dependencies
pip install -r requirements.txt

# Optional: Install FlashAttention 2 for better performance
pip install flash-attn --no-build-isolation

# Run the server
./start.sh
# or
python -m uvicorn server:app --host 0.0.0.0 --port 7860

Directory Structure

.
├── server.py           # Main FastAPI server
├── Dockerfile          # Multi-stage Docker build
├── requirements.txt    # Python dependencies
├── start.sh            # Startup script
├── resources/          # Voice reference files
│   └── demo_speaker0.mp3
└── outputs/            # Generated audio (temporary)

Usage Examples

Synthesize Speech

curl "http://localhost:7860/synthesize_speech/?text=Hello%20world&voice=demo_speaker0" \
  --output output.wav

Upload a Voice

curl -X POST "http://localhost:7860/upload_audio/" \
  -F "audio_file_label=my_voice" \
  -F "file=@/path/to/voice_sample.mp3"

Use Uploaded Voice

curl "http://localhost:7860/synthesize_speech/?text=Hello%20world&voice=my_voice" \
  --output output.wav

Change Voice of Audio

curl -X POST "http://localhost:7860/change_voice/" \
  -F "reference_speaker=demo_speaker0" \
  -F "file=@/path/to/input.wav" \
  --output converted.wav

Models

Model	Purpose	Size
`Qwen/Qwen3-TTS-12Hz-1.7B-Base`	Voice cloning TTS	1.7B parameters
`openai/whisper-base`	Audio transcription	74M parameters

Models are pre-downloaded during Docker build and stored in /root/.cache/.

Requirements

GPU: CUDA-compatible GPU with 8GB+ VRAM recommended
CUDA: 12.8+ (for RTX 5090/Blackwell support)
Python: 3.10+

RunPod Deployment

The Docker image is optimized for RunPod:

Server files are in /app/server/ (not /workspace/)
/workspace/ is left free for network volumes
Uses sleep infinity to keep container alive for web terminal access

License

See the respective licenses for:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
F5-TTS_API_REFERENCE.md		F5-TTS_API_REFERENCE.md
LICENSE		LICENSE
QWEN3_TTS_RESEARCH.md		QWEN3_TTS_RESEARCH.md
README.md		README.md
build_docker.py		build_docker.py
demo_speaker0.mp3		demo_speaker0.mp3
requirements.txt		requirements.txt
server.py		server.py
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3-TTS Server

Features

API Endpoints

Endpoint Details

`GET /synthesize_speech/`

`GET /base_tts/`

`POST /upload_audio/`

`POST /change_voice/`

Quick Start

Using Docker (Recommended)

Local Development

Directory Structure

Usage Examples

Synthesize Speech

Upload a Voice

Use Uploaded Voice

Change Voice of Audio

Models

Requirements

RunPod Deployment

License

About

Uh oh!

Releases

Packages

Languages

License

ValyrianTech/Qwen3-TTS_server

Folders and files

Latest commit

History

Repository files navigation

Qwen3-TTS Server

Features

API Endpoints

Endpoint Details

GET /synthesize_speech/

GET /base_tts/

POST /upload_audio/

POST /change_voice/

Quick Start

Using Docker (Recommended)

Local Development

Directory Structure

Usage Examples

Synthesize Speech

Upload a Voice

Use Uploaded Voice

Change Voice of Audio

Models

Requirements

RunPod Deployment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /synthesize_speech/`

`GET /base_tts/`

`POST /upload_audio/`

`POST /change_voice/`

Packages