Inference and deployment toolkit for Svara-TTS, an open-source multilingual text-to-speech model for Indic languages โ includes examples for local GGUF inference, Gradio demo, and production-ready API deployment.
- 38 Voice Profiles: Support for 19 Indian languages with male and female voices
- Streaming Audio: Real-time audio generation with low-latency streaming
- OpenAI-Compatible API: Drop-in replacement for OpenAI's
/v1/audio/speechendpoint - Production Ready: Docker deployment with embedded vLLM engine
- GPU Accelerated: CUDA-optimized inference with configurable SNAC decoder device
- Multiple Audio Formats: Output in MP3, Opus, AAC, WAV, or raw PCM via ffmpeg
- Zero-Shot Voice Cloning: Clone any voice with a short audio reference
- Long-Text Chunking: Automatic sentence-boundary splitting with crossfade stitching
Hindi, Bengali, Marathi, Telugu, Kannada, Bhojpuri, Magahi, Chhattisgarhi, Maithili, Assamese, Bodo, Dogri, Gujarati, Malayalam, Punjabi, Tamil, English (Indian), Nepali, Sanskrit
Deploy Svara TTS as a production API service with Docker:
# Clone repository
git clone <repository-url>
cd svara-tts-inference
# Configure (optional)
cp .env.example .env
# Build and start
docker-compose up -d
# Test the API
curl http://localhost:8080/health
curl http://localhost:8080/v1/voicesGet Available Voices:
curl http://localhost:8080/v1/voicesOpenAI-Compatible Endpoint:
curl -X POST http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Hello from Svara TTS!",
"voice": "en_male",
"response_format": "mp3"
}' \
--output speech.mp3Streaming:
curl -N -X POST http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "เคจเคฎเคธเฅเคคเฅ, เคฎเฅเค เคธเฅเคตเคฐเคพ เคเฅเคเฅเคเคธ เคนเฅเค",
"voice": "hi_male",
"response_format": "wav",
"stream": true
}' \
--output audio.wavPython Example (OpenAI SDK):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.audio.speech.create(
model="svara-tts-v1",
voice="hi_female",
input="เคจเคฎเคธเฅเคคเฅ, เคฎเฅเค เคธเฅเคตเคฐเคพ เคนเฅเคเฅค",
response_format="mp3",
)
response.stream_to_file("output.mp3")See examples/api_client.py for more examples.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/v1/voices |
GET | List available voices |
/v1/audio/speech |
POST | OpenAI-compatible TTS (supports streaming, zero-shot cloning) |
For svara-tts-v1, voice IDs follow the format {language_code}_{gender}:
- Hindi:
hi_male,hi_female - English:
en_male,en_female - Bengali:
bn_male,bn_female - See full list in DEPLOYMENT.md
The /v1/audio/speech endpoint also accepts display names like Hindi (Male), English (Female).
All endpoints support multiple output formats via the response_format parameter:
| Format | MIME Type | Notes |
|---|---|---|
mp3 |
audio/mpeg |
Default |
opus |
audio/ogg |
Great for streaming |
aac |
audio/aac |
ADTS container |
wav |
audio/wav |
Uncompressed, larger files |
pcm |
audio/pcm |
Raw signed 16-bit LE, 24kHz mono |
For detailed deployment instructions, configuration options, and troubleshooting:
Read the Full Deployment Guide
Topics covered:
- Prerequisites and hardware requirements
- Docker configuration
- Environment variables
- Production deployment with nginx
- Troubleshooting and monitoring
- Multi-GPU setup
The server runs as a single process with the vLLM engine embedded directly in the FastAPI application. This eliminates the HTTP hop between API server and LLM engine, reducing latency and operational complexity.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Server โ Port 8080
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Embedded vLLM Engine โ โ
โ โ (AsyncLLMEngine) โ โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโ โ
โ โ SNAC Decoder โ โ
โ โ Token โ PCM Audio โ โ
โ โ (SNAC_DEVICE: cpu/cuda) โ โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโ โ
โ โ ffmpeg (format convert) โ โ
โ โ PCM โ MP3/Opus/WAV/AAC โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
For detailed architecture documentation, see ARCHITECTURE.md.
svara-tts-inference/
โโโ api/ # FastAPI server
โ โโโ server.py # Main API endpoints + engine init
โ โโโ models.py # Pydantic request/response models
โโโ tts_engine/ # Core TTS engine
โ โโโ orchestrator.py # TTS pipeline orchestration
โ โโโ transports.py # Embedded vLLM transport
โ โโโ buffers.py # Audio prebuffering + crossfade
โ โโโ mapper.py # Token-to-SNAC mapping
โ โโโ codec.py # SNAC encoder/decoder
โ โโโ voice_config.py # Voice profiles
โ โโโ encoder.py # Text-to-token encoding
โ โโโ constants.py # Token IDs and special tokens
โ โโโ utils.py # Utilities (chunking, audio processing)
โโโ assets/ # Voice config YAML files
โโโ examples/ # Example scripts
โ โโโ api_client.py # API client examples
โโโ Dockerfile # Docker image
โโโ docker-compose.yml # Docker Compose config
โโโ supervisord.conf # Process manager config
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment variable template
# Install dependencies
pip install -r requirements.txt
# Configure environment (optional)
cp .env.example .env
# Start the server (vLLM engine starts embedded)
cd api && python server.pyThe vLLM engine initializes in-process during FastAPI startup โ no separate vLLM server needed.
- GPU: NVIDIA GPU with 16GB+ VRAM (recommended: 24GB+)
- RAM: 16GB+ system RAM
- Storage: 50GB+ free space
- Docker 20.10+
- Docker Compose 2.0+
- NVIDIA GPU Drivers
- NVIDIA Container Toolkit
See LICENSE file for details.
If you use Svara TTS in your research, please cite:
@misc{svara-tts-v1,
title={Svara TTS: Multilingual Text-to-Speech for Indic Languages},
author={Kenpath},
year={2024},
url={https://huggingface.co/kenpath/svara-tts-v1}
}