🎙️ RunPod Chatterbox TTS

A serverless text-to-speech endpoint powered by Chatterbox Multilingual on RunPod. Generate natural speech in 23 languages with zero-shot voice cloning.

✨ Features

23 Languages — Spanish, English, French, German, Japanese, Chinese, and more
Voice Cloning — Clone any voice with just 5-10 seconds of reference audio
Emotion Control — Adjust expressiveness from monotone to dramatic
Serverless — Pay only for what you use, auto-scales to zero
Fast Cold Starts — Model pre-baked into Docker image + FlashBoot ready

🚀 Quick Start

Deploy to RunPod

Pull the image or build your own:

docker build -t yourusername/chatterbox-tts:latest .
docker push yourusername/chatterbox-tts:latest

Create a serverless endpoint in RunPod Console:
- Template → New Template → Enter your Docker image
- GPU: RTX 4000 Ada / L4 / A4000 (8-16GB VRAM)
- Enable FlashBoot ✅

API Usage

import runpod
import base64

runpod.api_key = "your_api_key"
endpoint = runpod.Endpoint("your_endpoint_id")

# Basic TTS
result = endpoint.run_sync({
    "input": {
        "text": "Hola, esto es una prueba.",
        "language_id": "es"
    }
})

# With voice cloning
with open("reference.wav", "rb") as f:
    ref_audio = base64.b64encode(f.read()).decode()

result = endpoint.run_sync({
    "input": {
        "text": "Your text here",
        "language_id": "en",
        "reference_audio": ref_audio,
        "exaggeration": 0.6
    }
})

# Save output
audio = base64.b64decode(result["audio_base64"])
with open("output.wav", "wb") as f:
    f.write(audio)

📥 Input Schema

Field	Type	Required	Description
`text`	string	✅	Text to synthesize
`language_id`	string	✅	Language code (see below)
`reference_audio`	string	❌	Base64 WAV for voice cloning
`exaggeration`	float	❌	Emotion intensity (0.0-1.0, default 0.5)
`cfg_weight`	float	❌	Style adherence (0.0-1.0, default 0.5)

📤 Output Schema

{
  "audio_base64": "UklGRi...",
  "sample_rate": 24000,
  "duration_seconds": 2.45
}

🌍 Supported Languages

Code	Language	Code	Language	Code	Language
`ar`	Arabic	`he`	Hebrew	`pl`	Polish
`da`	Danish	`hi`	Hindi	`pt`	Portuguese
`de`	German	`it`	Italian	`ru`	Russian
`el`	Greek	`ja`	Japanese	`sv`	Swedish
`en`	English	`ko`	Korean	`sw`	Swahili
`es`	Spanish	`ms`	Malay	`tr`	Turkish
`fi`	Finnish	`nl`	Dutch	`zh`	Chinese
`fr`	French	`no`	Norwegian

🎯 Voice Cloning Tips

For best results:

Use 5-15 seconds of clean audio
WAV format, 24kHz+ sample rate
Single speaker, no background noise
Match the reference style to desired output emotion

💰 Cost Estimation

Traffic	GPU	Active Workers	~Monthly Cost
100 req/day	RTX 4000 Ada	0 (flex)	$5-15
1,000 req/day	L4	1	$50-80
10,000+ req/day	L4	2+	$200+

🛠️ Local Development

# Test locally
python handler.py --test_input '{"input": {"text": "Hello world", "language_id": "en"}}'

📄 License

MIT — Model weights subject to Chatterbox license.

🙏 Credits

Resemble AI — Chatterbox TTS model
RunPod — Serverless GPU infrastructure

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
voices		voices
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
download_model.py		download_model.py
handler.py		handler.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ RunPod Chatterbox TTS

✨ Features

🚀 Quick Start

Deploy to RunPod

API Usage

📥 Input Schema

📤 Output Schema

🌍 Supported Languages

🎯 Voice Cloning Tips

💰 Cost Estimation

🛠️ Local Development

📄 License

🙏 Credits

About

Uh oh!

Releases

Packages

Languages

Hannyel0/runpod-chatterbox-TTS

Folders and files

Latest commit

History

Repository files navigation

🎙️ RunPod Chatterbox TTS

✨ Features

🚀 Quick Start

Deploy to RunPod

API Usage

📥 Input Schema

📤 Output Schema

🌍 Supported Languages

🎯 Voice Cloning Tips

💰 Cost Estimation

🛠️ Local Development

📄 License

🙏 Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages