This project provides easy-to-use Google Colab notebooks for running cutting-edge Text-to-Speech (TTS) models β all powered by free GPUs from Google Colab.
Whether you're experimenting, researching, or just playing around with voice synthesis, these notebooks make it simple to try out top TTS models without worrying about setup or hardware.
- Added at 2025-04-25
- GitHub Link
- Capabilities: Text-to-speech, Predefined Voices
- Note: Not an open-sourced model
- Added at 2025-05-19
- GitHub Link (Original Coqui TTS is no longer maintained as Coqui shut down in 2023.)
- Model Link
- Capabilities: Text-to-speech, Predefined Voices, Multi-lingual, Voice Cloning from Audio
- Languages supported: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi)
- Reason for recommendation: High-quality generation with multi-lingual support and voice cloning from short audio clips.
- Added at 2025-05-19
- GitHub Link: myshell-ai/OpenVoice (Used for voice conversion based on reference voice), coqui-tts (Use as base TTS model)
- Model Link
- Capabilities: Text-to-speech, Multi-lingual, Voice Cloning from Audio
- Languages supported: English (en), Spanish (es), French (fr), Chinese (zh-cn), Japanese (ja), Korean (ko)
- Added at 2025-05-19
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Multi-lingual, Predefined Voices, Guided generation
- Languages supported: English, French, Spanish, Portuguese, Polish, German, Italian and Dutch
- Added at 2025-05-19
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Multi-lingual, Predefined Voices
- Languages supported: American English (a), British English (b), Spanish (es), French (fr-fr), Hindi (hi), Italian (it), Japanese (ja), Brazilian Portuguese (pt-br), Mandarin Chinese (zh)
- Reason for recommendation: Very high-quality generation with multi-lingual support.
- Added at 2025-05-19
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Conversational, Non-verbal sounds, Voice Cloning from Audio
- Reasons for recommendation: High-quality generation with conversational and non-verbal sounds.
- Added at 2025-05-20
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Predefined Voices, Multi-lingual, Voice Cloning
- Languages supported: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi)
- Added at 2025-06-06
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Emotion Exaggeration Control, Voice Cloning, Watermarked Outputs
- Added at 2025-08-07
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Multi-language support (20+ languages), Multiple voices, Customizable voices (training support)
- Added at 2025-08-07
- GitHub Link
- Model Link (Nano Preview)
- Capabilities: Text-to-speech, Multiple Expressive Voices, CPU-compatible, Ultra-small (25MB, 15M params)
- Reason for recommendation: Extremely lightweight and fast TTS model suitable for edge devices and real-time applications. Open-source and easy to run locally.
- Added at 2025-08-26
- GitHub Link
- Model Link
- Capabilities: Context-Aware Expression, Multi-lingual conversation, Podcast with Background Music, Long Conversational Speech
- Languages supported: English, Chinese
- Reasons for recommendation: Can generate long-form (up to 90 mins), multi-speaker (up to 4) expressive, conversational audio.
- Added at 2025-09-17
- GitHub Link
- Model Link
- Capabilities: Emotion-Controlled Speech, Duration-Specific Generation, Zero-Shot Timbre Cloning, Multi-Modal Emotion Guidance, High-Stability Emotional Speech
- Reasons for recommendation: Great voice-cloning capability with emotional steering.
- Added at 2025-10-21
- GitHub Link
- Model Link
- Capabilities: Real-Time On-Device Speech, Ultra-Realistic Human-Like Voices, Instant Voice Cloning (3s sample), Embedded-Optimized GGUF Format, Secure & Watermarked Output
- Reasons for recommendation: Super-realistic, on-device text-to-speech (TTS) language model with instant voice cloning.
- Added at 2025-12-02
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Human-Like Expressive Speech, Zero-Shot Voice Cloning, Guided Emotion & Intonation Tags, Low-Latency Streaming
- Languages supported: English
- Voices: Multiple preset speaker options (tara, leah, jess, leo, dan, mia, zac, zoe)
- Reasons for recommendation: Small LLM-based model, highly expressive, human-like voice generation with zero-shot voice cloning.
- Added at 2025-12-05
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Predefined Voices, Extreme-Speed Inference, Lightweight Deployment, Natural Text Handling, Fully Local Processing
- Reason for recommendation: Ultra-lightweight (66M parameters), lightning-fast even on CPU with decent quality, privacy-safe on-device processing.
- Added at 2025-12-15
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Zero-Shot Voice Cloning, RL-Enhanced Emotion Control, Streaming Real-Time Synthesis, Phoneme-Level Control
- Languages supported: English, Chinese, Mixed Language (En/Zh)
- Reason for recommendation: LLM-powered TTS with zero-shot voice cloning, RL-tuned emotion control, and streaming capabilities for interactive applications.
- Added at 2025-12-30
- GitHub Link
- Model Link
- Capabilities: Text-to-speech, Ultra-Fast Real-Time TTS, 32 kHz High-Fidelity Audio, Streaming Inference, Lightweight Deployment, Open-Source
- Languages supported: English
- Reason for recommendation: Ultra-lightweight (80M parameters), extremely fast (~2000Γ RTF) with streaming synthesis and sub-frame latency for real-time applications.
- Added at 2025-12-30
- GitHub Link
- Model Link
- Capabilities: CPU-Based Speech Generation, Voice Cloning, Instant Audio Streaming, Low Latency (~200ms), Faster Than Real-Time (~6x), Python API and CLI, Handles Long Text Inputs
- Languages supported: English
- Reason for recommendation: Ultra-lightweight (100M parameters), CPU-optimized with ultra-low latency (~200ms) for real-time applications on resource-constrained devices.
- Added at 2025-12-30
- Official Blog
- Model Link (0.6B)
- Model Link (1.7B)
- Capabilities: Multilingual TTS, Ultra-Low-Latency Streaming (~97ms), Instruction-Based Voice Control, Rapid 3s Voice Cloning, High-Fidelity Speech Reconstruction
- Languages supported: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
- Reason for recommendation: End-to-end discrete LM architecture with extreme low-latency generation, instruction-aware speech synthesis, and strong robustness to noisy or complex text inputs.
Curious how different TTS models stack up before picking which one to run?
Check out these Hugging Face Spaces with live performance leaderboards:
- TTS Arena V2 by TTS-AGI
- TTS Arena by TTS-AGI (Replaced by TTS Arena V2)
- TTS Spaces Arena by Pendrokar
Have a favorite TTS model you'd like to see added to this project?
Open an issue or start a discussion to request it!
If you're interested in running Large Language Models (LLMs) on consumer-level local machines, check out this related project:
- Local-LLM-Comparison-Colab-UI:
A collection of Google Colab notebooks for comparing and running various LLMs easily, designed for use on local hardware.
Perfect for exploring and benchmarking LLMs without needing powerful cloud resources!
Contributions to this project are welcome and appreciated! Here's how you can contribute:
- Create a Google Colab notebook for a TTS model following the format of existing notebooks
- Test your notebook thoroughly to ensure it works properly with Google Colab's free GPU
- Fork this repository and add your notebook to the project
- Update the README.md to include information about the model following the existing format:
- Add a section with the model name
- Include the Colab badge linking to your notebook
- Add GitHub and model links
- List capabilities and supported languages (if multi-lingual)
- Open a Pull Request with your changes
By contributing, you help make advanced TTS technology more accessible to everyone!
This project is for educational and research purposes. Always verify licenses and model usage terms when using TTS models in production.