This project presents a Luganda Text-to-Speech (TTS) system built using the VITS architecture for end-to-end speech synthesis.
- Fine-tuned a VITS model using 2.8 hours of proprietary, professionally recorded Luganda speech from a female speaker.
- The base model was pretrained on female Luganda speech from Mozilla's Common Voice.
- Compare the output of different model checkpoints interactively via our GitHub Pages demo.
π Live Audio Comparison Page
Explore how different models perform on the same input sentences through side-by-side audio playback.