-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Summary
Would love to see IndicF5 added as a TTS backend in Voicebox! It would bring support for 11 Indian languages including Tamil, Hindi, Bengali, Telugu, Malayalam, Kannada, Gujarati, Marathi, Punjabi, Odia, and Assamese — languages spoken by over 1.4 billion people that are currently not supported by any existing Voicebox backend.
What is IndicF5?
IndicF5 is a near-human quality, open-source Text-to-Speech model by AI4Bharat, trained on 1417 hours of high-quality Indian language speech data.
- 🔗 GitHub: https://github.com/AI4Bharat/IndicF5
- 🤗 HuggingFace: https://huggingface.co/ai4bharat/IndicF5
- 📄 License: Open Source
Why it fits Voicebox perfectly
- ✅ Pure Python library — integrates cleanly with the existing FastAPI backend
- ✅ Follows the same reference-audio voice cloning approach as existing backends
- ✅ Works with PyTorch + CUDA — no new dependencies outside what Voicebox already uses
- ✅ Installable with a single pip command
- ✅ Voicebox's modular
backends/architecture in v0.3.0 makes this straightforward to add
How it works
IndicF5 takes 3 inputs — exactly like other Voicebox backends:
- Text to synthesize
- A reference audio clip (for voice cloning)
- Transcript of the reference audio
from transformers import AutoModel
model = AutoModel.from_pretrained("ai4bharat/IndicF5", trust_remote_code=True)
audio = model(text, ref_audio_path="sample.wav", ref_text="transcript")Impact
There is currently no good local, open-source voice cloning tool for Indian languages. Adding IndicF5 would make Voicebox the go-to tool for Indian language TTS and would open the app to a massive new user base.
Happy to help!
I'm willing to contribute a PR if that helps move this forward. Would love to know if this is something you'd consider including!