Skip to content

Troyanovsky/awesome-TTS-Colab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

50 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—£οΈ Awesome TTS Models in Google Colab

Awesome

This project provides easy-to-use Google Colab notebooks for running cutting-edge Text-to-Speech (TTS) models β€” all powered by free GPUs from Google Colab.

Whether you're experimenting, researching, or just playing around with voice synthesis, these notebooks make it simple to try out top TTS models without worrying about setup or hardware.

Colab Notebooks

Edge TTS

  • Added at 2025-04-25
  • Open In Colab
  • GitHub Link
  • Capabilities: Text-to-speech, Predefined Voices
  • Note: Not an open-sourced model

xTTS

  • Added at 2025-05-19
  • Open In Colab
  • GitHub Link (Original Coqui TTS is no longer maintained as Coqui shut down in 2023.)
  • Model Link
  • Capabilities: Text-to-speech, Predefined Voices, Multi-lingual, Voice Cloning from Audio
  • Languages supported: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi)
  • Reason for recommendation: High-quality generation with multi-lingual support and voice cloning from short audio clips.

OpenVoice V2 (Voice Conversion)

  • Added at 2025-05-19
  • Open In Colab
  • GitHub Link: myshell-ai/OpenVoice (Used for voice conversion based on reference voice), coqui-tts (Use as base TTS model)
  • Model Link
  • Capabilities: Text-to-speech, Multi-lingual, Voice Cloning from Audio
  • Languages supported: English (en), Spanish (es), French (fr), Chinese (zh-cn), Japanese (ja), Korean (ko)

Parler TTS

  • Added at 2025-05-19
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Multi-lingual, Predefined Voices, Guided generation
  • Languages supported: English, French, Spanish, Portuguese, Polish, German, Italian and Dutch

Kokoro TTS

  • Added at 2025-05-19
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Multi-lingual, Predefined Voices
  • Languages supported: American English (a), British English (b), Spanish (es), French (fr-fr), Hindi (hi), Italian (it), Japanese (ja), Brazilian Portuguese (pt-br), Mandarin Chinese (zh)
  • Reason for recommendation: Very high-quality generation with multi-lingual support.

Dia 1.6B TTS

  • Added at 2025-05-19
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Conversational, Non-verbal sounds, Voice Cloning from Audio
  • Reasons for recommendation: High-quality generation with conversational and non-verbal sounds.

Auralis xTTS V2

  • Added at 2025-05-20
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Predefined Voices, Multi-lingual, Voice Cloning
  • Languages supported: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi)

Chatterbox TTS

  • Added at 2025-06-06
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Emotion Exaggeration Control, Voice Cloning, Watermarked Outputs

Piper TTS

  • Added at 2025-08-07
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Multi-language support (20+ languages), Multiple voices, Customizable voices (training support)

Kitten TTS

  • Added at 2025-08-07
  • Open In Colab
  • GitHub Link
  • Model Link (Nano Preview)
  • Capabilities: Text-to-speech, Multiple Expressive Voices, CPU-compatible, Ultra-small (25MB, 15M params)
  • Reason for recommendation: Extremely lightweight and fast TTS model suitable for edge devices and real-time applications. Open-source and easy to run locally.

VibeVoice 1.5B TTS

  • Added at 2025-08-26
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Context-Aware Expression, Multi-lingual conversation, Podcast with Background Music, Long Conversational Speech
  • Languages supported: English, Chinese
  • Reasons for recommendation: Can generate long-form (up to 90 mins), multi-speaker (up to 4) expressive, conversational audio.

Index TTS V2

  • Added at 2025-09-17
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Emotion-Controlled Speech, Duration-Specific Generation, Zero-Shot Timbre Cloning, Multi-Modal Emotion Guidance, High-Stability Emotional Speech
  • Reasons for recommendation: Great voice-cloning capability with emotional steering.

Neu TTS Air

  • Added at 2025-10-21
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Real-Time On-Device Speech, Ultra-Realistic Human-Like Voices, Instant Voice Cloning (3s sample), Embedded-Optimized GGUF Format, Secure & Watermarked Output
  • Reasons for recommendation: Super-realistic, on-device text-to-speech (TTS) language model with instant voice cloning.

Orpheus TTS

  • Added at 2025-12-02
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Human-Like Expressive Speech, Zero-Shot Voice Cloning, Guided Emotion & Intonation Tags, Low-Latency Streaming
  • Languages supported: English
  • Voices: Multiple preset speaker options (tara, leah, jess, leo, dan, mia, zac, zoe)
  • Reasons for recommendation: Small LLM-based model, highly expressive, human-like voice generation with zero-shot voice cloning.

Supertonic TTS

  • Added at 2025-12-05
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Predefined Voices, Extreme-Speed Inference, Lightweight Deployment, Natural Text Handling, Fully Local Processing
  • Reason for recommendation: Ultra-lightweight (66M parameters), lightning-fast even on CPU with decent quality, privacy-safe on-device processing.

GLM TTS

  • Added at 2025-12-15
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Zero-Shot Voice Cloning, RL-Enhanced Emotion Control, Streaming Real-Time Synthesis, Phoneme-Level Control
  • Languages supported: English, Chinese, Mixed Language (En/Zh)
  • Reason for recommendation: LLM-powered TTS with zero-shot voice cloning, RL-tuned emotion control, and streaming capabilities for interactive applications.

Soprano TTS

  • Added at 2025-12-30
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: Text-to-speech, Ultra-Fast Real-Time TTS, 32 kHz High-Fidelity Audio, Streaming Inference, Lightweight Deployment, Open-Source
  • Languages supported: English
  • Reason for recommendation: Ultra-lightweight (80M parameters), extremely fast (~2000Γ— RTF) with streaming synthesis and sub-frame latency for real-time applications.

Pocket TTS

  • Added at 2025-12-30
  • Open In Colab
  • GitHub Link
  • Model Link
  • Capabilities: CPU-Based Speech Generation, Voice Cloning, Instant Audio Streaming, Low Latency (~200ms), Faster Than Real-Time (~6x), Python API and CLI, Handles Long Text Inputs
  • Languages supported: English
  • Reason for recommendation: Ultra-lightweight (100M parameters), CPU-optimized with ultra-low latency (~200ms) for real-time applications on resource-constrained devices.

Qwen3 TTS

  • Added at 2025-12-30
  • Open In Colab
  • Official Blog
  • Model Link (0.6B)
  • Model Link (1.7B)
  • Capabilities: Multilingual TTS, Ultra-Low-Latency Streaming (~97ms), Instruction-Based Voice Control, Rapid 3s Voice Cloning, High-Fidelity Speech Reconstruction
  • Languages supported: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
  • Reason for recommendation: End-to-end discrete LM architecture with extreme low-latency generation, instruction-aware speech synthesis, and strong robustness to noisy or complex text inputs.

πŸ“Š Picking TTS Models?

Curious how different TTS models stack up before picking which one to run?

Check out these Hugging Face Spaces with live performance leaderboards:

πŸ™‹ Request a Model

Have a favorite TTS model you'd like to see added to this project?
Open an issue or start a discussion to request it!

πŸ‘€ You Might Be Interested In

If you're interested in running Large Language Models (LLMs) on consumer-level local machines, check out this related project:

  • Local-LLM-Comparison-Colab-UI:
    A collection of Google Colab notebooks for comparing and running various LLMs easily, designed for use on local hardware.
    Perfect for exploring and benchmarking LLMs without needing powerful cloud resources!

🀝 Contributing

Contributions to this project are welcome and appreciated! Here's how you can contribute:

  1. Create a Google Colab notebook for a TTS model following the format of existing notebooks
  2. Test your notebook thoroughly to ensure it works properly with Google Colab's free GPU
  3. Fork this repository and add your notebook to the project
  4. Update the README.md to include information about the model following the existing format:
    • Add a section with the model name
    • Include the Colab badge linking to your notebook
    • Add GitHub and model links
    • List capabilities and supported languages (if multi-lingual)
  5. Open a Pull Request with your changes

By contributing, you help make advanced TTS technology more accessible to everyone!

πŸ“Œ Disclaimer

This project is for educational and research purposes. Always verify licenses and model usage terms when using TTS models in production.

About

Collection of awesome TTS and voice cloning models to run with Google Colab

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors