Skip to content

filliptm/ComfyUI_Fill-ChatterBox

Repository files navigation

FL ChatterBox

High-quality text-to-speech nodes for ComfyUI powered by ResembleAI's Chatterbox models. Features voice cloning, multilingual synthesis, paralinguistic expressions, and voice conversion.

Chatterbox Patreon

Workflow Preview

Features

  • Zero-Shot Voice Cloning - Clone any voice from a few seconds of reference audio
  • 3 TTS Models - Standard, Turbo (faster), and Multilingual variants
  • 23 Languages - Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Turkish
  • Paralinguistic Tags - Express emotions with tags like [laugh], [sigh], [gasp], [chuckle] (Turbo model)
  • Voice Conversion - Transform one voice to sound like another
  • Dialog Synthesis - Multi-speaker conversations with up to 4 voices
  • Model Caching - Keep models loaded between runs for faster iteration

Nodes

Node Description
FL Chatterbox TTS Standard high-quality text-to-speech with voice cloning
FL Chatterbox Turbo TTS Faster GPT2-based TTS with paralinguistic tag support
FL Chatterbox Multilingual TTS 23-language TTS with voice cloning
FL Chatterbox VC Voice conversion - transform source audio to target voice
FL Chatterbox Dialog TTS Multi-speaker dialog synthesis with up to 4 voices

Installation

ComfyUI Manager

Search for "FL ChatterBox" and install.

Manual

cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_Fill-ChatterBox.git
cd ComfyUI_Fill-ChatterBox
pip install -r requirements.txt

Optional: Watermarking Support

pip install resemble-perth

Note: The resemble-perth package may have compatibility issues with Python 3.12+. Nodes will function without watermarking if import fails.

Quick Start

  1. Add FL Chatterbox TTS (or Turbo/Multilingual variant)
  2. Enter your text in the text field
  3. Optionally connect reference audio for voice cloning
  4. Set keep_model_loaded = True for faster subsequent runs
  5. Generate!

Turbo Model with Expressions

Hello there! [laugh] Isn't this amazing? [sigh] I just love text to speech.

Supported tags: [laugh], [sigh], [gasp], [chuckle], [cough], [sniff], [groan], [shush], [clear throat]

Models

Model Speed Languages Notes
Standard Normal English Highest quality
Turbo Fast English Paralinguistic tags, GPT2-based
Multilingual Normal 23 languages Cross-lingual voice cloning

Models download automatically on first use to ComfyUI/models/chatterbox/.

Parameters

TTS Parameters

Parameter Range Description
exaggeration 0.25-2.0 Emotion intensity
cfg_weight 0.2-1.0 Pace/classifier-free guidance
temperature 0.05-5.0 Randomness in generation
seed 0-4.29B Reproducible generation
keep_model_loaded bool Cache model between runs

Turbo Parameters

Parameter Range Description
temperature 0.05-2.0 Randomness in generation
top_k 1-5000 Top-k sampling
top_p 0.1-1.0 Nucleus sampling threshold
repetition_penalty 1.0-3.0 Token repetition penalty

Limitations

  • Maximum audio length: ~40 seconds per generation
  • Reference audio: Minimum 5-6 seconds recommended
  • Turbo paralinguistic tags: English only

Requirements

  • Python 3.10+
  • 8GB RAM minimum (16GB+ recommended)
  • NVIDIA GPU with 8GB+ VRAM recommended
  • CPU and Mac MPS supported

License

MIT License - See Chatterbox repo for model licenses.

Changelog

2025-12-28

  • Added Turbo TTS node (faster, GPT2-based with paralinguistic tags)
  • Added Multilingual TTS node (23 languages)
  • Improved model caching using module-level globals
  • Centralized model downloads to ComfyUI/models/chatterbox/

2025-07-24

  • Added Dialog TTS node for multi-speaker conversations (up to 4 speakers)
  • Extended all nodes with seed parameters for reproducible generation
  • Isolated audio track outputs per speaker

2025-06-24

  • Added seed parameter for reproducible generation
  • Made Perth watermarking optional for Python 3.12+ compatibility

2025-05-31

  • Added persistent model loading and loading bar
  • Added Mac MPS support
  • Native inference code (removed chatterbox-tts library dependency)

About

TTS + Voice Cloning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published