Add HumeAI TADA TTS engine (1B English + 3B Multilingual)#296
Add HumeAI TADA TTS engine (1B English + 3B Multilingual)#296
Conversation
Integrates HumeAI's TADA (Text-Acoustic Dual Alignment) speech-language model as a new TTS engine. TADA uses a novel 1:1 token-audio alignment that produces coherent speech over long sequences (700s+). Two model variants: - tada-1b: English-only, ~4GB, built on Llama 3.2 1B - tada-3b-ml: 10 languages, ~8GB, built on Llama 3.2 3B Backend uses the Encoder for voice prompt encoding with caching, and TadaForCausalLM with flow-matching diffusion for generation. Supports bf16 inference on CUDA, forces CPU on macOS (MPS compatibility). Installed with --no-deps due to torch>=2.7 pin conflict; descript-audio-codec and torchaudio added as explicit sub-dependencies.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds first-class TADA TTS support across frontend, API types/schemas, and backend: a new HumeTadaBackend with model/codec download, caching, loading/unloading, generation and a DAC shim; CI/build updates to install and bundle Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/UI
participant API as Backend API
participant Router as Engine Router
participant TADA as HumeTadaBackend
participant Cache as Local Cache/FS
participant HumeAI as HumeAI (snapshot_download)
Client->>API: POST /generate {engine:"tada", model_size:"1B"|"3B", text, voice_prompt?}
API->>Router: resolve TTS backend for "tada"
Router->>TADA: ensure instantiated & loaded
TADA->>Cache: check codec & model cached?
alt not cached
TADA->>HumeAI: snapshot_download(model & codec)
HumeAI-->>Cache: model & codec files
end
TADA->>TADA: load_model(model_size) (async lock, device/dtype select)
API->>TADA: generate(text, voice_prompt, language, seed?)
TADA->>TADA: rehydrate encoder prompt, synthesize audio (24kHz)
TADA-->>API: audio bytes + sample_rate
API-->>Client: streamed/returned audio
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/backends/__init__.py (2)
446-457:⚠️ Potential issue | 🟠 MajorTADA needs model_size handling in
get_model_load_func.The TADA engine supports multiple model sizes (1B, 3B), but the fallback path at line 457 calls
load_model()without themodel_sizeargument. This will always load the default 1B model, even when the config specifies 3B.🐛 Proposed fix to pass model_size for TADA
if config.engine == "qwen": return lambda: tts.get_tts_model().load_model(config.model_size) + if config.engine == "tada": + return lambda: get_tts_backend_for_engine(config.engine).load_model(config.model_size) + return lambda: get_tts_backend_for_engine(config.engine).load_model()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/__init__.py` around lines 446 - 457, get_model_load_func currently omits the model_size when calling load_model for the generic TADA fallback, so TADA always loads the default size; update the fallback lambda in get_model_load_func to pass config.model_size to load_model (i.e., change the final return to call get_tts_backend_for_engine(config.engine).load_model(config.model_size)) so the configured size (e.g., 3B) is respected.
425-443:⚠️ Potential issue | 🟠 MajorTADA needs model_size check in
check_model_loaded.Similar to Qwen, TADA supports multiple model sizes, but this function only checks
backend.is_loaded()without verifying the correct size is loaded. This could returnTruewhen tada-1b is loaded but the config is for tada-3b-ml.🐛 Proposed fix to check model_size for TADA
if config.engine == "qwen": tts_model = tts.get_tts_model() loaded_size = getattr(tts_model, "_current_model_size", None) or getattr(tts_model, "model_size", None) return tts_model.is_loaded() and loaded_size == config.model_size + if config.engine == "tada": + backend = get_tts_backend_for_engine(config.engine) + loaded_size = getattr(backend, "model_size", None) + return backend.is_loaded() and loaded_size == config.model_size + backend = get_tts_backend_for_engine(config.engine) return backend.is_loaded()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/__init__.py` around lines 425 - 443, check_model_loaded currently returns backend.is_loaded() for TADA without verifying the loaded model size; update the TADA branch (where backend = get_tts_backend_for_engine(config.engine) and later return backend.is_loaded()) to also check the model size similar to Qwen: obtain the loaded size via getattr(backend, "_current_model_size", None) or getattr(backend, "model_size", None) and return True only if backend.is_loaded() and the loaded size equals config.model_size. Ensure you keep the existing whisper/qwen checks intact and handle missing attributes gracefully.
🧹 Nitpick comments (4)
backend/backends/hume_backend.py (3)
57-57: Unused class-level lock.
_load_lockis declared as aClassVar[threading.Lock]but is never used in the class. The actual locking is done via the instance-level_model_load_lock(asyncio.Lock) at line 64. Consider removing this dead code.🧹 Remove unused lock
class HumeTadaBackend: """HumeAI TADA TTS backend for high-quality voice cloning.""" - _load_lock: ClassVar[threading.Lock] = threading.Lock() - def __init__(self):Also remove the
threadingimport at line 18 if no longer needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 57, Remove the unused class-level lock declaration `_load_lock: ClassVar[threading.Lock] = threading.Lock()` from the class (it's dead code; locking is handled by the instance `_model_load_lock` which is an `asyncio.Lock`), and delete the now-unnecessary `threading` import from the top of the file so there are no unused imports remaining.
19-19: Consider using modern type hints.
typing.Listandtyping.Tupleare deprecated in favor of built-inlistandtuple(PEP 585). This is a minor style improvement.🧹 Use modern type hints
-from typing import ClassVar, List, Optional, Tuple +from typing import ClassVar, OptionalThen update usages:
List[str]→list[str]Tuple[dict, bool]→tuple[dict, bool]Tuple[np.ndarray, str]→tuple[np.ndarray, str]Tuple[np.ndarray, int]→tuple[np.ndarray, int]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 19, Update the typing imports and annotations to use PEP 585 built-in generics: remove List and Tuple from the import line (currently "from typing import ClassVar, List, Optional, Tuple") and replace their usages in this module with the built-in forms (e.g., change List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str], Tuple[np.ndarray, int] → tuple[np.ndarray, int]); keep ClassVar and Optional from typing as-is and adjust any function/method signatures, return annotations, and variable annotations that reference List/Tuple accordingly (look for usages in functions like any top-level helpers, class attributes, and methods within this file).
211-220: Consider simplifying redundant branches.The static analyzer flagged that lines 215-220 have identical assignments. The
elifbranches forlist,int/float, and theelseclause all assignvaldirectly. While the explicit type checks document intent, they could be combined.🧹 Simplify serialization logic
for field_name in prompt.__dataclass_fields__: val = getattr(prompt, field_name) if isinstance(val, torch.Tensor): prompt_dict[field_name] = val.detach().cpu() - elif isinstance(val, list): - prompt_dict[field_name] = val - elif isinstance(val, (int, float)): - prompt_dict[field_name] = val else: + # Lists, scalars, and other values pass through unchanged prompt_dict[field_name] = val🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` around lines 211 - 220, The loop over prompt.__dataclass_fields__ has redundant branches that all assign val directly; simplify by handling torch.Tensor specially (detach().cpu()) and for all other types assign val to prompt_dict[field_name] in a single else branch. Update the loop in the function that builds prompt_dict (the code iterating field_name and using getattr(prompt, field_name)) to only check isinstance(val, torch.Tensor) and otherwise set prompt_dict[field_name] = val, removing the duplicate elif checks for list and (int, float).backend/backends/__init__.py (1)
397-422: Consider adding model_size check inunload_model_by_configfor TADA.For consistency with the proposed fixes above,
unload_model_by_configshould also verify the loaded model size matches the config before unloading for the TADA engine. Currently, it would unload any loaded TADA model regardless of which size variant is loaded.♻️ Proposed fix
if config.engine == "qwen": tts_model = tts.get_tts_model() loaded_size = getattr(tts_model, "_current_model_size", None) or getattr(tts_model, "model_size", None) if tts_model.is_loaded() and loaded_size == config.model_size: tts.unload_tts_model() return True return False + if config.engine == "tada": + backend = get_tts_backend_for_engine(config.engine) + loaded_size = getattr(backend, "model_size", None) + if backend.is_loaded() and loaded_size == config.model_size: + backend.unload_model() + return True + return False + # All other TTS engines backend = get_tts_backend_for_engine(config.engine)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/__init__.py` around lines 397 - 422, unload_model_by_config currently unloads the TADA backend without verifying model size; update the function to check the loaded model size matches config.model_size before calling unload for the TADA engine. Inside unload_model_by_config, after obtaining backend = get_tts_backend_for_engine(config.engine), add a branch for config.engine == "tada" (or detect TADA backend) that reads the backend's loaded size (e.g., getattr(backend, "_current_model_size", None) or getattr(backend, "model_size", None)), verify backend.is_loaded() and loaded_size == config.model_size, and only then call backend.unload_model(); otherwise return False. Ensure you reference unload_model_by_config, get_tts_backend_for_engine, backend.is_loaded, and backend.unload_model when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src/lib/hooks/useGenerationForm.ts`:
- Around line 82-84: The modelName generation for engine/variable model mapping
is incorrect for the TADA 3B variant; update the logic that computes modelName
(the expression using engine, data.modelSize) so that when engine === 'tada' and
data.modelSize (case-insensitive) equals '3b' it returns 'tada-3b-ml', otherwise
preserve the existing fallback (e.g., `tada-${...}` behavior for other sizes);
ensure you reference the same variables (engine, data.modelSize, and the
modelName assignment in useGenerationForm.ts) so the produced string matches the
backend's expected name.
In `@backend/backends/hume_backend.py`:
- Around line 200-207: Replace the direct torchaudio.load() usage with the
shared load_audio() helper to ensure consistent preprocessing (mono conversion)
before encoding: call load_audio(audio_path, sample_rate=sr, mono=True) to
obtain the audio tensor and use that when invoking self.encoder(audio,
text=text_arg, sample_rate=sr); also add/import load_audio from
backend.utils.audio. If you intentionally keep torchaudio.load(), explicitly
convert the tensor to mono before calling self.encoder and document that
decision in the code where audio, sr, audio_path, reference_text and
self.encoder are used.
- Around line 110-124: The download code uses snapshot_download with token=None
(in calls near snapshot_download for TADA_CODEC_REPO and repo while logging via
logger.info and referencing model_size), which relies on a cached hf token and
doesn't make it clear users must authenticate and accept the Llama license;
update the backend documentation (e.g., backend/README.md) to document that
users must run huggingface-cli login (or hf auth login) with a token that has
repository access and explicitly accept the Llama license for the TADA 1B and
3B-ML models before running the code so snapshot_download will succeed.
---
Outside diff comments:
In `@backend/backends/__init__.py`:
- Around line 446-457: get_model_load_func currently omits the model_size when
calling load_model for the generic TADA fallback, so TADA always loads the
default size; update the fallback lambda in get_model_load_func to pass
config.model_size to load_model (i.e., change the final return to call
get_tts_backend_for_engine(config.engine).load_model(config.model_size)) so the
configured size (e.g., 3B) is respected.
- Around line 425-443: check_model_loaded currently returns backend.is_loaded()
for TADA without verifying the loaded model size; update the TADA branch (where
backend = get_tts_backend_for_engine(config.engine) and later return
backend.is_loaded()) to also check the model size similar to Qwen: obtain the
loaded size via getattr(backend, "_current_model_size", None) or
getattr(backend, "model_size", None) and return True only if backend.is_loaded()
and the loaded size equals config.model_size. Ensure you keep the existing
whisper/qwen checks intact and handle missing attributes gracefully.
---
Nitpick comments:
In `@backend/backends/__init__.py`:
- Around line 397-422: unload_model_by_config currently unloads the TADA backend
without verifying model size; update the function to check the loaded model size
matches config.model_size before calling unload for the TADA engine. Inside
unload_model_by_config, after obtaining backend =
get_tts_backend_for_engine(config.engine), add a branch for config.engine ==
"tada" (or detect TADA backend) that reads the backend's loaded size (e.g.,
getattr(backend, "_current_model_size", None) or getattr(backend, "model_size",
None)), verify backend.is_loaded() and loaded_size == config.model_size, and
only then call backend.unload_model(); otherwise return False. Ensure you
reference unload_model_by_config, get_tts_backend_for_engine, backend.is_loaded,
and backend.unload_model when making the change.
In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class-level lock declaration `_load_lock:
ClassVar[threading.Lock] = threading.Lock()` from the class (it's dead code;
locking is handled by the instance `_model_load_lock` which is an
`asyncio.Lock`), and delete the now-unnecessary `threading` import from the top
of the file so there are no unused imports remaining.
- Line 19: Update the typing imports and annotations to use PEP 585 built-in
generics: remove List and Tuple from the import line (currently "from typing
import ClassVar, List, Optional, Tuple") and replace their usages in this module
with the built-in forms (e.g., change List[str] → list[str], Tuple[dict, bool] →
tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str],
Tuple[np.ndarray, int] → tuple[np.ndarray, int]); keep ClassVar and Optional
from typing as-is and adjust any function/method signatures, return annotations,
and variable annotations that reference List/Tuple accordingly (look for usages
in functions like any top-level helpers, class attributes, and methods within
this file).
- Around line 211-220: The loop over prompt.__dataclass_fields__ has redundant
branches that all assign val directly; simplify by handling torch.Tensor
specially (detach().cpu()) and for all other types assign val to
prompt_dict[field_name] in a single else branch. Update the loop in the function
that builds prompt_dict (the code iterating field_name and using getattr(prompt,
field_name)) to only check isinstance(val, torch.Tensor) and otherwise set
prompt_dict[field_name] = val, removing the duplicate elif checks for list and
(int, float).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ccdeb5eb-4f63-47b6-90da-c038d837b31a
📒 Files selected for processing (13)
.github/workflows/release.ymlDockerfileapp/src/components/Generation/EngineModelSelector.tsxapp/src/components/ServerSettings/ModelManagement.tsxapp/src/lib/api/types.tsapp/src/lib/constants/languages.tsapp/src/lib/hooks/useGenerationForm.tsbackend/backends/__init__.pybackend/backends/hume_backend.pybackend/build_binary.pybackend/models.pybackend/requirements.txtjustfile
The real descript-audio-codec package pulls in descript-audiotools, which transitively requires onnx, tensorboard, protobuf, matplotlib, pystoi, and other heavy dependencies. onnx fails to build from source on macOS due to CMake version incompatibility. TADA only uses Snake1d (a 7-line PyTorch module) from DAC. This commit adds a shim in backend/utils/dac_shim.py that registers fake dac.* modules in sys.modules with just the Snake1d class, completely eliminating the DAC/audiotools dependency chain.
TADA hardcodes 'meta-llama/Llama-3.2-1B' as its tokenizer source in both the Aligner and TadaForCausalLM.from_pretrained(). That repo is gated and requires accepting Meta's license on HuggingFace. Monkey-patch AutoTokenizer.from_pretrained during model loading to redirect Llama tokenizer requests to 'unsloth/Llama-3.2-1B', an ungated mirror with identical tokenizer files. The patch is scoped to model loading only and restored immediately after.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/backends/hume_backend.py (2)
57-57: Unused class variable_load_lock.The
_load_lockClassVar is declared but never used in this class. The instance-level_model_load_lock(asyncio.Lock) at line 64 handles load serialization. Consider removing the unused threading.Lock.🧹 Proposed removal
class HumeTadaBackend: """HumeAI TADA TTS backend for high-quality voice cloning.""" - _load_lock: ClassVar[threading.Lock] = threading.Lock() - def __init__(self):Also remove
threadingfrom imports at line 18 if no longer needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 57, The class declares an unused class variable _load_lock: ClassVar[threading.Lock] = threading.Lock() which is redundant because instance-level _model_load_lock (asyncio.Lock) already serializes model loading; remove the _load_lock declaration and, if threading is no longer used elsewhere in this module, also remove the threading import to avoid an unused import. Ensure you only modify the backend/backends/hume_backend.py definitions of _load_lock and the top-level imports, leaving _model_load_lock and its usage intact.
19-19: Use modern type hints (list,tupleinstead oftyping.List,typing.Tuple).Python 3.9+ supports built-in generics directly. This is a minor modernization.
🧹 Proposed changes
-from typing import ClassVar, List, Optional, Tuple +from typing import ClassVar, OptionalThen update usages:
- Line 179:
Tuple[dict, bool]→tuple[dict, bool]- Line 239:
List[str]→list[str]- Line 241:
Tuple[np.ndarray, str]→tuple[np.ndarray, str]- Line 251:
Tuple[np.ndarray, int]→tuple[np.ndarray, int]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 19, Replace typing.List and typing.Tuple usage with built-in generics: remove List and Tuple from the import and keep ClassVar and Optional (e.g., change "from typing import ClassVar, List, Optional, Tuple" to "from typing import ClassVar, Optional"). Then update the annotated types where used: change Tuple[dict, bool] to tuple[dict, bool], List[str] to list[str], Tuple[np.ndarray, str] to tuple[np.ndarray, str], and Tuple[np.ndarray, int] to tuple[np.ndarray, int]; ensure any import of List/Tuple is removed and the code uses the built-in generics consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: The class declares an unused class variable _load_lock:
ClassVar[threading.Lock] = threading.Lock() which is redundant because
instance-level _model_load_lock (asyncio.Lock) already serializes model loading;
remove the _load_lock declaration and, if threading is no longer used elsewhere
in this module, also remove the threading import to avoid an unused import.
Ensure you only modify the backend/backends/hume_backend.py definitions of
_load_lock and the top-level imports, leaving _model_load_lock and its usage
intact.
- Line 19: Replace typing.List and typing.Tuple usage with built-in generics:
remove List and Tuple from the import and keep ClassVar and Optional (e.g.,
change "from typing import ClassVar, List, Optional, Tuple" to "from typing
import ClassVar, Optional"). Then update the annotated types where used: change
Tuple[dict, bool] to tuple[dict, bool], List[str] to list[str],
Tuple[np.ndarray, str] to tuple[np.ndarray, str], and Tuple[np.ndarray, int] to
tuple[np.ndarray, int]; ensure any import of List/Tuple is removed and the code
uses the built-in generics consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0be16f01-5e28-499d-b2c9-7057b45920d4
📒 Files selected for processing (4)
backend/backends/hume_backend.pybackend/build_binary.pybackend/requirements.txtbackend/utils/dac_shim.py
torchaudio 2.10+ switched its default audio loading backend to torchcodec, which isn't installed. Replace torchaudio.load() with soundfile.read() in create_voice_prompt(). TADA's internal use of torchaudio.functional.resample() is unaffected (pure PyTorch math, no torchcodec dependency).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
backend/backends/hume_backend.py (3)
57-57: Remove unused class variable_load_lock.
_load_lock(threading.Lock) is defined but never used in this class. The async coordination uses_model_load_lock(asyncio.Lock) instead.🧹 Suggested fix
class HumeTadaBackend: """HumeAI TADA TTS backend for high-quality voice cloning.""" - _load_lock: ClassVar[threading.Lock] = threading.Lock() - def __init__(self):Also remove the unused
threadingimport from line 18.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 57, Remove the unused class variable _load_lock and its threading.Lock initialization from the HumeBackend class and delete the now-unneeded threading import; verify that all async coordination uses _model_load_lock (asyncio.Lock) so no other synchronization references to _load_lock remain and run tests/lint to ensure no remaining references to threading or _load_lock exist.
248-260: Simplify serialization logic with unified handling.The branches at lines 254-259 all perform the same assignment. This can be simplified.
♻️ Suggested simplification
# Serialize EncoderOutput to a dict of CPU tensors for caching prompt_dict = {} for field_name in prompt.__dataclass_fields__: val = getattr(prompt, field_name) if isinstance(val, torch.Tensor): prompt_dict[field_name] = val.detach().cpu() - elif isinstance(val, list): - prompt_dict[field_name] = val - elif isinstance(val, (int, float)): - prompt_dict[field_name] = val else: + # Preserve lists, scalars, and other values as-is prompt_dict[field_name] = val return prompt_dict🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` around lines 248 - 260, The serialization loop for prompt fields is verbose because multiple branches all do the same assignment; simplify by keeping only the special-case for torch.Tensor (call val.detach().cpu()) and otherwise assign the value directly to prompt_dict[field_name], using the existing loop over prompt.__dataclass_fields__; update the code around prompt_dict creation so only torch.Tensor is transformed and all other types are handled by a single assignment to reduce redundancy.
19-19: Use built-in generics instead of deprecatedtypingimports.
typing.Listandtyping.Tupleare deprecated since Python 3.9. Use the lowercase built-in equivalents for consistency with modern Python.♻️ Suggested fix
-from typing import ClassVar, List, Optional, Tuple +from typing import ClassVar, OptionalThen update the type hints:
List[str]→list[str]Tuple[dict, bool]→tuple[dict, bool]Tuple[np.ndarray, str]→tuple[np.ndarray, str]Tuple[np.ndarray, int]→tuple[np.ndarray, int]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 19, Replace deprecated typing generics by using built-in generics: remove List and Tuple from the typing import on the top line (keep ClassVar and Optional) and change all type annotations that use typing.List or typing.Tuple to the built-in forms; specifically update occurrences like List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str], and Tuple[np.ndarray, int] → tuple[np.ndarray, int] in the functions and classes within hume_backend (look for annotations on methods and return types referencing List or Tuple and update them accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/backends/hume_backend.py`:
- Around line 183-185: The restoration replaces the classmethod descriptor with
its raw function, breaking later AutoTokenizer.from_pretrained calls; preserve
the original descriptor (store the full classmethod object in
_orig_from_pretrained) and only use _orig_from_pretrained.__func__ when invoking
inside the monkeypatch, then restore AutoTokenizer.from_pretrained to the saved
descriptor in the finally block (referencing AutoTokenizer.from_pretrained and
_orig_from_pretrained in your change).
---
Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class variable _load_lock and its threading.Lock
initialization from the HumeBackend class and delete the now-unneeded threading
import; verify that all async coordination uses _model_load_lock (asyncio.Lock)
so no other synchronization references to _load_lock remain and run tests/lint
to ensure no remaining references to threading or _load_lock exist.
- Around line 248-260: The serialization loop for prompt fields is verbose
because multiple branches all do the same assignment; simplify by keeping only
the special-case for torch.Tensor (call val.detach().cpu()) and otherwise assign
the value directly to prompt_dict[field_name], using the existing loop over
prompt.__dataclass_fields__; update the code around prompt_dict creation so only
torch.Tensor is transformed and all other types are handled by a single
assignment to reduce redundancy.
- Line 19: Replace deprecated typing generics by using built-in generics: remove
List and Tuple from the typing import on the top line (keep ClassVar and
Optional) and change all type annotations that use typing.List or typing.Tuple
to the built-in forms; specifically update occurrences like List[str] →
list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] →
tuple[np.ndarray, str], and Tuple[np.ndarray, int] → tuple[np.ndarray, int] in
the functions and classes within hume_backend (look for annotations on methods
and return types referencing List or Tuple and update them accordingly).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 53b741f1-a705-4b1f-a702-c2bebff4b7a6
📒 Files selected for processing (1)
backend/backends/hume_backend.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/backends/hume_backend.py (1)
155-156:⚠️ Potential issue | 🔴 CriticalRestore
AutoTokenizer.from_pretrainedas a classmethod descriptor, not a raw function.
_orig_from_pretrainedis captured as.__func__(Line 155) and then restored directly (Line 185), which replaces the classmethod with a plain function and can break later calls.🐛 Proposed fix
- _orig_from_pretrained = AutoTokenizer.from_pretrained.__func__ + _orig_from_pretrained = AutoTokenizer.from_pretrained `@classmethod` # type: ignore[misc] def _patched_from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs): if "meta-llama/Llama-3.2" in str(pretrained_model_name_or_path): pretrained_model_name_or_path = "unsloth/Llama-3.2-1B" kwargs.setdefault("token", None) logger.info("Redirecting Llama tokenizer to ungated mirror: unsloth/Llama-3.2-1B") - return _orig_from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs) + return _orig_from_pretrained.__func__(cls, pretrained_model_name_or_path, *args, **kwargs) AutoTokenizer.from_pretrained = _patched_from_pretrained try: ... finally: # Restore original to avoid affecting other code AutoTokenizer.from_pretrained = _orig_from_pretrained#!/bin/bash # Verify the risky capture/restore pattern still exists in the file rg -n "from_pretrained\\.__func__|AutoTokenizer\\.from_pretrained = _orig_from_pretrained" backend/backends/hume_backend.py -C2 # Demonstrate why restoring a classmethod as raw function breaks class calls python - <<'PY' class Demo: `@classmethod` def f(cls, x): return cls.__name__, x raw = Demo.f.__func__ Demo.f = raw # mirrors the problematic restore pattern try: Demo.f(1) except TypeError as e: print("TypeError reproduced:", e) PYAlso applies to: 165-166, 183-185
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` around lines 155 - 156, You captured AutoTokenizer.from_pretrained as .__func__ and later assign it back as a plain function, which strips the classmethod descriptor; instead capture the descriptor itself (e.g., _orig_from_pretrained = AutoTokenizer.from_pretrained) or when restoring wrap the raw function with classmethod (e.g., AutoTokenizer.from_pretrained = classmethod(_orig_from_pretrained_func)). Update the save/restore pairs around _orig_from_pretrained (and the other similar captures/restores at the mentioned locations) so the restored attribute is a proper classmethod descriptor rather than a raw function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/backends/hume_backend.py`:
- Around line 189-197: unload_model currently deletes self.model and
self.encoder while worker threads in _encode_sync() or _generate_sync() may be
using them, causing a race; add a shared synchronization mechanism such as an
integer active_ops counter plus a threading.Lock/Condition (or a single state
Lock) on the backend instance, increment active_ops at the start of _encode_sync
and _generate_sync and decrement it in a finally block, and have unload_model
acquire the lock and wait (or block) until active_ops == 0 before
deleting/setting model and encoder to None. Update the methods named
unload_model, _encode_sync, and _generate_sync to use this counter/lock pattern
so unload is serialized against in-flight encoding/generation (apply same fix to
other similar regions noted in the comment).
---
Duplicate comments:
In `@backend/backends/hume_backend.py`:
- Around line 155-156: You captured AutoTokenizer.from_pretrained as .__func__
and later assign it back as a plain function, which strips the classmethod
descriptor; instead capture the descriptor itself (e.g., _orig_from_pretrained =
AutoTokenizer.from_pretrained) or when restoring wrap the raw function with
classmethod (e.g., AutoTokenizer.from_pretrained =
classmethod(_orig_from_pretrained_func)). Update the save/restore pairs around
_orig_from_pretrained (and the other similar captures/restores at the mentioned
locations) so the restored attribute is a proper classmethod descriptor rather
than a raw function.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 94241c81-4bda-472d-b3d0-8425085cbeef
📒 Files selected for processing (1)
backend/backends/hume_backend.py
Replace the monkey-patch on AutoTokenizer.from_pretrained (which broke the classmethod descriptor and caused 'Tokenizer not loaded' errors when loading Qwen after TADA) with two targeted config patches: - Set AlignerConfig.tokenizer_name to the local ungated tokenizer path - Pre-load TadaConfig, inject tokenizer_name, pass config= to from_pretrained No global state is modified; other engines are unaffected.
There was a problem hiding this comment.
🧹 Nitpick comments (4)
backend/backends/hume_backend.py (4)
57-57: Unused class-level lock.
_load_lockis defined as a class-levelthreading.Lockbut is never used. The instance-level_model_load_lock(asyncio.Lock) handles load serialization. Either remove this dead code or clarify its intended purpose.♻️ Suggested fix if unused
class HumeTadaBackend: """HumeAI TADA TTS backend for high-quality voice cloning.""" - _load_lock: ClassVar[threading.Lock] = threading.Lock() - def __init__(self):Also remove the unused
threadingimport from line 18.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 57, Remove the unused class-level threading Lock and its import: delete the class attribute _load_lock and remove the threading import (the class already uses the instance-level asyncio.Lock named _model_load_lock for serialization), or if a class-level lock was intended, replace usages to use _load_lock consistently and ensure it's actually used; reference the symbols _load_lock, _model_load_lock and the threading import when making the change.
252-257: Simplify redundant branches.These branches all perform the same assignment. Consolidate into a single
elseclause.♻️ Suggested fix
if isinstance(val, torch.Tensor): prompt_dict[field_name] = val.detach().cpu() - elif isinstance(val, list): - prompt_dict[field_name] = val - elif isinstance(val, (int, float)): - prompt_dict[field_name] = val else: prompt_dict[field_name] = val🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` around lines 252 - 257, The three redundant isinstance branches that each do prompt_dict[field_name] = val should be consolidated: remove the separate elif blocks for isinstance(val, list) and isinstance(val, (int, float)) and replace them with a single else (or simply perform prompt_dict[field_name] = val unconditionally) so that only distinct handling remains and prompt_dict[field_name] is assigned from val in one place; update the block containing prompt_dict, field_name, and val accordingly.
19-19: Use modern type hint syntax.
ListandTuplefromtypingare deprecated in Python 3.9+. Use built-inlistandtupleinstead.♻️ Suggested fix
-from typing import ClassVar, List, Optional, Tuple +from typing import ClassVar, OptionalThen update usages throughout the file:
List[str]→list[str]Tuple[dict, bool]→tuple[dict, bool]Tuple[np.ndarray, str]→tuple[np.ndarray, str]Tuple[np.ndarray, int]→tuple[np.ndarray, int]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` at line 19, The imports and type annotations in hume_backend.py use deprecated typing generics; replace List and Tuple from typing with built-in generics (list, tuple) and update all annotations accordingly: change the import line to remove List and Tuple, keep ClassVar/Optional if needed, and update occurrences such as List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str], and Tuple[np.ndarray, int] → tuple[np.ndarray, int] (ensure any other List/Tuple usages in functions or class attributes are similarly converted).
309-319: Hoist model dtype lookup outside the loop.
next(self.model.parameters()).dtypeis called for every floating-point tensor in the dictionary. Cache the dtype once before the loop.♻️ Suggested fix
# Reconstruct EncoderOutput from the cached dict restored = {} + model_dtype = next(self.model.parameters()).dtype for k, v in voice_prompt.items(): if isinstance(v, torch.Tensor): # Move to device and match model dtype for float tensors if v.is_floating_point(): - model_dtype = next(self.model.parameters()).dtype restored[k] = v.to(device=device, dtype=model_dtype) else: restored[k] = v.to(device=device) else: restored[k] = v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/hume_backend.py` around lines 309 - 319, The loop over voice_prompt repeatedly calls next(self.model.parameters()).dtype for each floating-point tensor; compute and cache the model dtype once before the loop (e.g., model_dtype = next(self.model.parameters()).dtype) and then use that cached model_dtype inside the for k, v in voice_prompt.items() loop when calling v.to(device=device, dtype=model_dtype); leave non-floating tensors and non-tensor values handling unchanged and continue assigning into restored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class-level threading Lock and its import: delete
the class attribute _load_lock and remove the threading import (the class
already uses the instance-level asyncio.Lock named _model_load_lock for
serialization), or if a class-level lock was intended, replace usages to use
_load_lock consistently and ensure it's actually used; reference the symbols
_load_lock, _model_load_lock and the threading import when making the change.
- Around line 252-257: The three redundant isinstance branches that each do
prompt_dict[field_name] = val should be consolidated: remove the separate elif
blocks for isinstance(val, list) and isinstance(val, (int, float)) and replace
them with a single else (or simply perform prompt_dict[field_name] = val
unconditionally) so that only distinct handling remains and
prompt_dict[field_name] is assigned from val in one place; update the block
containing prompt_dict, field_name, and val accordingly.
- Line 19: The imports and type annotations in hume_backend.py use deprecated
typing generics; replace List and Tuple from typing with built-in generics
(list, tuple) and update all annotations accordingly: change the import line to
remove List and Tuple, keep ClassVar/Optional if needed, and update occurrences
such as List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool],
Tuple[np.ndarray, str] → tuple[np.ndarray, str], and Tuple[np.ndarray, int] →
tuple[np.ndarray, int] (ensure any other List/Tuple usages in functions or class
attributes are similarly converted).
- Around line 309-319: The loop over voice_prompt repeatedly calls
next(self.model.parameters()).dtype for each floating-point tensor; compute and
cache the model dtype once before the loop (e.g., model_dtype =
next(self.model.parameters()).dtype) and then use that cached model_dtype inside
the for k, v in voice_prompt.items() loop when calling v.to(device=device,
dtype=model_dtype); leave non-floating tensors and non-tensor values handling
unchanged and continue assigning into restored.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8bcd01a2-43b5-487c-9101-68c5df2bb11b
📒 Files selected for processing (1)
backend/backends/hume_backend.py
Remove @torch.jit.script from the DAC shim's snake() function — TorchScript calls inspect.getsource() which fails in PyInstaller binaries (no .py source files). Update all user-facing docs: 4 → 5 TTS engines, add TADA row to every engine comparison table, mark TADA as Shipped in the upcoming engines list, update architecture diagrams and tech stack tables.
Summary
--no-depsinstallation pattern (same as Chatterbox) due totorch>=2.7pin conflictWhat is TADA?
TADA is a speech-language model by HumeAI built on Llama 3.2. It uses a novel 1:1 text-acoustic token alignment with flow-matching diffusion, enabling:
Changes
Backend (4 files)
backend/backends/hume_backend.py(new) — TTSBackend implementation with Encoder-based voice prompt encoding, cached prompt serialization, bf16 CUDA / fp32 CPU supportbackend/backends/__init__.py— Engine registry, ModelConfig entries, factory branch, multi-size model loadingbackend/models.py— Extended engine and model_size validation regexesbackend/build_binary.py— PyInstaller hidden imports for tada, dac, torchaudio;--collect-all dac,--collect-submodules tadaFrontend (5 files)
app/src/lib/api/types.ts— Extended engine and model_size TypeScript unionsapp/src/lib/constants/languages.ts— Added TADA language map (en, ar, zh, de, es, fr, it, ja, pl, pt)app/src/components/Generation/EngineModelSelector.tsx— TADA 1B/3B options withtada:SIZEformat handlingapp/src/lib/hooks/useGenerationForm.ts— Zod schema, model name/display name mappings, model_size passingapp/src/components/ServerSettings/ModelManagement.tsx— Model descriptions and voice model filterDependencies & Infrastructure (4 files)
backend/requirements.txt— Addeddescript-audio-codec>=1.0.0andtorchaudiojustfile—--no-deps hume-tadain Unix and Windows setup targets.github/workflows/release.yml—--no-deps hume-tadain CPU and CUDA build stepsDockerfile—--no-deps hume-tadainstall lineTesting
Needs local testing via
just dev:just build)Note
TADA models are built on Meta Llama 3.2, which requires accepting the Llama license on HuggingFace. Users will need a HuggingFace token with Llama access for the initial model download.
Summary by CodeRabbit
New Features
Chores