Add HumeAI TADA TTS engine (1B English + 3B Multilingual) by jamiepine · Pull Request #296 · jamiepine/voicebox

jamiepine · 2026-03-17T08:55:51Z

Summary

Integrates HumeAI's TADA (Text-Acoustic Dual Alignment) speech-language model as a new TTS engine with two variants: tada-1b (English, ~4GB) and tada-3b-ml (10 languages, ~8GB)
Full end-to-end implementation across backend, frontend, dependencies, CI, Docker, and PyInstaller bundling
Uses --no-deps installation pattern (same as Chatterbox) due to torch>=2.7 pin conflict

What is TADA?

TADA is a speech-language model by HumeAI built on Llama 3.2. It uses a novel 1:1 text-acoustic token alignment with flow-matching diffusion, enabling:

700s+ of coherent audio generation
Dynamic duration synthesis (no fixed frame rate)
High-fidelity voice cloning from short reference audio
bf16 inference support (~50% memory savings on CUDA)

Changes

Backend (4 files)

backend/backends/hume_backend.py (new) — TTSBackend implementation with Encoder-based voice prompt encoding, cached prompt serialization, bf16 CUDA / fp32 CPU support
backend/backends/__init__.py — Engine registry, ModelConfig entries, factory branch, multi-size model loading
backend/models.py — Extended engine and model_size validation regexes
backend/build_binary.py — PyInstaller hidden imports for tada, dac, torchaudio; --collect-all dac, --collect-submodules tada

Frontend (5 files)

app/src/lib/api/types.ts — Extended engine and model_size TypeScript unions
app/src/lib/constants/languages.ts — Added TADA language map (en, ar, zh, de, es, fr, it, ja, pl, pt)
app/src/components/Generation/EngineModelSelector.tsx — TADA 1B/3B options with tada:SIZE format handling
app/src/lib/hooks/useGenerationForm.ts — Zod schema, model name/display name mappings, model_size passing
app/src/components/ServerSettings/ModelManagement.tsx — Model descriptions and voice model filter

Dependencies & Infrastructure (4 files)

backend/requirements.txt — Added descript-audio-codec>=1.0.0 and torchaudio
justfile — --no-deps hume-tada in Unix and Windows setup targets
.github/workflows/release.yml — --no-deps hume-tada in CPU and CUDA build steps
Dockerfile — --no-deps hume-tada install line

Testing

Needs local testing via just dev:

Model download (both tada-1b and tada-3b-ml)
Model loading on CPU
Generation produces valid 24kHz audio
Voice cloning from reference audio
Frozen binary build (just build)

Note

TADA models are built on Meta Llama 3.2, which requires accepting the Llama license on HuggingFace. Users will need a HuggingFace token with Llama access for the initial model download.

Summary by CodeRabbit

New Features
- Added TADA text-to-speech (TADA 1B English, TADA 3B multilingual) with UI selection, language/model-size options, updated model listings, and full backend support for loading, caching, prompts, and generation.
- Added a lightweight shim to emulate missing audio codec APIs when the full codec is absent.
Chores
- Updated packaging, build, container, and CI steps to include TADA-related packages and audio runtime support (including torchaudio and hume-tada).

Integrates HumeAI's TADA (Text-Acoustic Dual Alignment) speech-language model as a new TTS engine. TADA uses a novel 1:1 token-audio alignment that produces coherent speech over long sequences (700s+). Two model variants: - tada-1b: English-only, ~4GB, built on Llama 3.2 1B - tada-3b-ml: 10 languages, ~8GB, built on Llama 3.2 3B Backend uses the Encoder for voice prompt encoding with caching, and TadaForCausalLM with flow-matching diffusion for generation. Supports bf16 inference on CUDA, forces CPU on macOS (MPS compatibility). Installed with --no-deps due to torch>=2.7 pin conflict; descript-audio-codec and torchaudio added as explicit sub-dependencies.

coderabbitai · 2026-03-17T08:56:10Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds first-class TADA TTS support across frontend, API types/schemas, and backend: a new HumeTadaBackend with model/codec download, caching, loading/unloading, generation and a DAC shim; CI/build updates to install and bundle hume-tada and related dependencies.

Changes

Cohort / File(s)	Summary
CI / Build / Packaging `.github/workflows/release.yml`, `justfile`, `Dockerfile`, `backend/requirements.txt`, `backend/build_binary.py`	Installs `hume-tada` (with `--no-deps`) in CI/just/Docker steps; adds `torchaudio` to backend requirements; expands PyInstaller hidden-imports / collect-submodules for `tada` and `dac` packaging.
Backend: TADA integration & shim `backend/backends/hume_backend.py`, `backend/backends/__init__.py`, `backend/utils/dac_shim.py`, `backend/models.py`	Adds `HumeTadaBackend` implementing async-safe load/unload, per-size model switching, snapshot download & cache checks, device/dtype selection, prompt encoding/caching, and `generate`/`create_voice_prompt`; registers `tada` engine and `tada-1b`/`tada-3b-ml` configs; adds a DAC shim; expands model_size/engine validation to accept `1B`/`3B` and `tada`.
Frontend: types, constants, hooks `app/src/lib/api/types.ts`, `app/src/lib/constants/languages.ts`, `app/src/lib/hooks/useGenerationForm.ts`	Extends `GenerationRequest` to include engine `tada` and model sizes `1B`/`3B`; adds `tada` supported languages; updates form schema, displayName, payload construction, and model-download logic to handle `tada` and size-specific constraints (1B English-only, 3B multilingual).
Frontend: UI components `app/src/components/Generation/EngineModelSelector.tsx`, `app/src/components/ServerSettings/ModelManagement.tsx`	Adds TADA options (`tada:1B`, `tada:3B`) and descriptions to engine selector; treats `tada-*` models as voice-generation models in management views.
Small additions / glue `app/src/lib/api/types.ts` (already grouped), other minor edits across repo	Minor control-flow, validation, and form reset adjustments to consistently integrate `tada` and `modelSize` across the stack.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/UI
    participant API as Backend API
    participant Router as Engine Router
    participant TADA as HumeTadaBackend
    participant Cache as Local Cache/FS
    participant HumeAI as HumeAI (snapshot_download)

    Client->>API: POST /generate {engine:"tada", model_size:"1B"|"3B", text, voice_prompt?}
    API->>Router: resolve TTS backend for "tada"
    Router->>TADA: ensure instantiated & loaded
    TADA->>Cache: check codec & model cached?
    alt not cached
        TADA->>HumeAI: snapshot_download(model & codec)
        HumeAI-->>Cache: model & codec files
    end
    TADA->>TADA: load_model(model_size) (async lock, device/dtype select)
    API->>TADA: generate(text, voice_prompt, language, seed?)
    TADA->>TADA: rehydrate encoder prompt, synthesize audio (24kHz)
    TADA-->>API: audio bytes + sample_rate
    API-->>Client: streamed/returned audio

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Backend refactor: modular architecture, style guide, tooling #285 — Overlapping backend model-loading/registry and packaging changes extended by this PR (shared files and packaging surface).
feat: LuxTTS integration — multi-engine TTS support #254 — Adds a new TTS engine and parallel frontend/backend engine config changes that overlap with this PR’s engine registry and UI updates.
Windows support: CUDA detection, cross-platform justfile, clean server shutdown #272 — Modifies packaging/build scripts and PyInstaller hidden-imports similar to this PR’s build changes.

Poem

🐰 I hopped through types and backend lanes,

I cached the codec, downloaded the gains,
1B hums in English, 3B sings more,
Prompts and snapshots settled on the floor,
A rabbit cheers — new voices soar!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding HumeAI TADA TTS engine support with both 1B English and 3B Multilingual variants. It is concise, specific, and clearly conveys the primary feature being introduced across the entire changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/add-tada-tts-engine

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/backends/__init__.py (2)

446-457: ⚠️ Potential issue | 🟠 Major

TADA needs model_size handling in get_model_load_func.

The TADA engine supports multiple model sizes (1B, 3B), but the fallback path at line 457 calls load_model() without the model_size argument. This will always load the default 1B model, even when the config specifies 3B.

🐛 Proposed fix to pass model_size for TADA

     if config.engine == "qwen":
         return lambda: tts.get_tts_model().load_model(config.model_size)
 
+    if config.engine == "tada":
+        return lambda: get_tts_backend_for_engine(config.engine).load_model(config.model_size)
+
     return lambda: get_tts_backend_for_engine(config.engine).load_model()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 446 - 457, get_model_load_func
currently omits the model_size when calling load_model for the generic TADA
fallback, so TADA always loads the default size; update the fallback lambda in
get_model_load_func to pass config.model_size to load_model (i.e., change the
final return to call
get_tts_backend_for_engine(config.engine).load_model(config.model_size)) so the
configured size (e.g., 3B) is respected.

425-443: ⚠️ Potential issue | 🟠 Major

TADA needs model_size check in check_model_loaded.

Similar to Qwen, TADA supports multiple model sizes, but this function only checks backend.is_loaded() without verifying the correct size is loaded. This could return True when tada-1b is loaded but the config is for tada-3b-ml.

🐛 Proposed fix to check model_size for TADA

         if config.engine == "qwen":
             tts_model = tts.get_tts_model()
             loaded_size = getattr(tts_model, "_current_model_size", None) or getattr(tts_model, "model_size", None)
             return tts_model.is_loaded() and loaded_size == config.model_size
 
+        if config.engine == "tada":
+            backend = get_tts_backend_for_engine(config.engine)
+            loaded_size = getattr(backend, "model_size", None)
+            return backend.is_loaded() and loaded_size == config.model_size
+
         backend = get_tts_backend_for_engine(config.engine)
         return backend.is_loaded()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 425 - 443, check_model_loaded
currently returns backend.is_loaded() for TADA without verifying the loaded
model size; update the TADA branch (where backend =
get_tts_backend_for_engine(config.engine) and later return backend.is_loaded())
to also check the model size similar to Qwen: obtain the loaded size via
getattr(backend, "_current_model_size", None) or getattr(backend, "model_size",
None) and return True only if backend.is_loaded() and the loaded size equals
config.model_size. Ensure you keep the existing whisper/qwen checks intact and
handle missing attributes gracefully.

🧹 Nitpick comments (4)

backend/backends/hume_backend.py (3)

57-57: Unused class-level lock.

_load_lock is declared as a ClassVar[threading.Lock] but is never used in the class. The actual locking is done via the instance-level _model_load_lock (asyncio.Lock) at line 64. Consider removing this dead code.

🧹 Remove unused lock

 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""
 
-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):

Also remove the threading import at line 18 if no longer needed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, Remove the unused class-level
lock declaration `_load_lock: ClassVar[threading.Lock] = threading.Lock()` from
the class (it's dead code; locking is handled by the instance `_model_load_lock`
which is an `asyncio.Lock`), and delete the now-unnecessary `threading` import
from the top of the file so there are no unused imports remaining.

19-19: Consider using modern type hints.

typing.List and typing.Tuple are deprecated in favor of built-in list and tuple (PEP 585). This is a minor style improvement.

🧹 Use modern type hints

-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update usages:

List[str] → list[str]
Tuple[dict, bool] → tuple[dict, bool]
Tuple[np.ndarray, str] → tuple[np.ndarray, str]
Tuple[np.ndarray, int] → tuple[np.ndarray, int]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, Update the typing imports and
annotations to use PEP 585 built-in generics: remove List and Tuple from the
import line (currently "from typing import ClassVar, List, Optional, Tuple") and
replace their usages in this module with the built-in forms (e.g., change
List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray,
str] → tuple[np.ndarray, str], Tuple[np.ndarray, int] → tuple[np.ndarray, int]);
keep ClassVar and Optional from typing as-is and adjust any function/method
signatures, return annotations, and variable annotations that reference
List/Tuple accordingly (look for usages in functions like any top-level helpers,
class attributes, and methods within this file).

211-220: Consider simplifying redundant branches.

The static analyzer flagged that lines 215-220 have identical assignments. The elif branches for list, int/float, and the else clause all assign val directly. While the explicit type checks document intent, they could be combined.

🧹 Simplify serialization logic

             for field_name in prompt.__dataclass_fields__:
                 val = getattr(prompt, field_name)
                 if isinstance(val, torch.Tensor):
                     prompt_dict[field_name] = val.detach().cpu()
-                elif isinstance(val, list):
-                    prompt_dict[field_name] = val
-                elif isinstance(val, (int, float)):
-                    prompt_dict[field_name] = val
                 else:
+                    # Lists, scalars, and other values pass through unchanged
                     prompt_dict[field_name] = val

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 211 - 220, The loop over
prompt.__dataclass_fields__ has redundant branches that all assign val directly;
simplify by handling torch.Tensor specially (detach().cpu()) and for all other
types assign val to prompt_dict[field_name] in a single else branch. Update the
loop in the function that builds prompt_dict (the code iterating field_name and
using getattr(prompt, field_name)) to only check isinstance(val, torch.Tensor)
and otherwise set prompt_dict[field_name] = val, removing the duplicate elif
checks for list and (int, float).

backend/backends/__init__.py (1)

397-422: Consider adding model_size check in unload_model_by_config for TADA.

For consistency with the proposed fixes above, unload_model_by_config should also verify the loaded model size matches the config before unloading for the TADA engine. Currently, it would unload any loaded TADA model regardless of which size variant is loaded.

♻️ Proposed fix

         if config.engine == "qwen":
             tts_model = tts.get_tts_model()
             loaded_size = getattr(tts_model, "_current_model_size", None) or getattr(tts_model, "model_size", None)
             if tts_model.is_loaded() and loaded_size == config.model_size:
                 tts.unload_tts_model()
                 return True
             return False
 
+        if config.engine == "tada":
+            backend = get_tts_backend_for_engine(config.engine)
+            loaded_size = getattr(backend, "model_size", None)
+            if backend.is_loaded() and loaded_size == config.model_size:
+                backend.unload_model()
+                return True
+            return False
+
         # All other TTS engines
         backend = get_tts_backend_for_engine(config.engine)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 397 - 422, unload_model_by_config
currently unloads the TADA backend without verifying model size; update the
function to check the loaded model size matches config.model_size before calling
unload for the TADA engine. Inside unload_model_by_config, after obtaining
backend = get_tts_backend_for_engine(config.engine), add a branch for
config.engine == "tada" (or detect TADA backend) that reads the backend's loaded
size (e.g., getattr(backend, "_current_model_size", None) or getattr(backend,
"model_size", None)), verify backend.is_loaded() and loaded_size ==
config.model_size, and only then call backend.unload_model(); otherwise return
False. Ensure you reference unload_model_by_config, get_tts_backend_for_engine,
backend.is_loaded, and backend.unload_model when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/lib/hooks/useGenerationForm.ts`:
- Around line 82-84: The modelName generation for engine/variable model mapping
is incorrect for the TADA 3B variant; update the logic that computes modelName
(the expression using engine, data.modelSize) so that when engine === 'tada' and
data.modelSize (case-insensitive) equals '3b' it returns 'tada-3b-ml', otherwise
preserve the existing fallback (e.g., `tada-${...}` behavior for other sizes);
ensure you reference the same variables (engine, data.modelSize, and the
modelName assignment in useGenerationForm.ts) so the produced string matches the
backend's expected name.

In `@backend/backends/hume_backend.py`:
- Around line 200-207: Replace the direct torchaudio.load() usage with the
shared load_audio() helper to ensure consistent preprocessing (mono conversion)
before encoding: call load_audio(audio_path, sample_rate=sr, mono=True) to
obtain the audio tensor and use that when invoking self.encoder(audio,
text=text_arg, sample_rate=sr); also add/import load_audio from
backend.utils.audio. If you intentionally keep torchaudio.load(), explicitly
convert the tensor to mono before calling self.encoder and document that
decision in the code where audio, sr, audio_path, reference_text and
self.encoder are used.
- Around line 110-124: The download code uses snapshot_download with token=None
(in calls near snapshot_download for TADA_CODEC_REPO and repo while logging via
logger.info and referencing model_size), which relies on a cached hf token and
doesn't make it clear users must authenticate and accept the Llama license;
update the backend documentation (e.g., backend/README.md) to document that
users must run huggingface-cli login (or hf auth login) with a token that has
repository access and explicitly accept the Llama license for the TADA 1B and
3B-ML models before running the code so snapshot_download will succeed.

---

Outside diff comments:
In `@backend/backends/__init__.py`:
- Around line 446-457: get_model_load_func currently omits the model_size when
calling load_model for the generic TADA fallback, so TADA always loads the
default size; update the fallback lambda in get_model_load_func to pass
config.model_size to load_model (i.e., change the final return to call
get_tts_backend_for_engine(config.engine).load_model(config.model_size)) so the
configured size (e.g., 3B) is respected.
- Around line 425-443: check_model_loaded currently returns backend.is_loaded()
for TADA without verifying the loaded model size; update the TADA branch (where
backend = get_tts_backend_for_engine(config.engine) and later return
backend.is_loaded()) to also check the model size similar to Qwen: obtain the
loaded size via getattr(backend, "_current_model_size", None) or
getattr(backend, "model_size", None) and return True only if backend.is_loaded()
and the loaded size equals config.model_size. Ensure you keep the existing
whisper/qwen checks intact and handle missing attributes gracefully.

---

Nitpick comments:
In `@backend/backends/__init__.py`:
- Around line 397-422: unload_model_by_config currently unloads the TADA backend
without verifying model size; update the function to check the loaded model size
matches config.model_size before calling unload for the TADA engine. Inside
unload_model_by_config, after obtaining backend =
get_tts_backend_for_engine(config.engine), add a branch for config.engine ==
"tada" (or detect TADA backend) that reads the backend's loaded size (e.g.,
getattr(backend, "_current_model_size", None) or getattr(backend, "model_size",
None)), verify backend.is_loaded() and loaded_size == config.model_size, and
only then call backend.unload_model(); otherwise return False. Ensure you
reference unload_model_by_config, get_tts_backend_for_engine, backend.is_loaded,
and backend.unload_model when making the change.

In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class-level lock declaration `_load_lock:
ClassVar[threading.Lock] = threading.Lock()` from the class (it's dead code;
locking is handled by the instance `_model_load_lock` which is an
`asyncio.Lock`), and delete the now-unnecessary `threading` import from the top
of the file so there are no unused imports remaining.
- Line 19: Update the typing imports and annotations to use PEP 585 built-in
generics: remove List and Tuple from the import line (currently "from typing
import ClassVar, List, Optional, Tuple") and replace their usages in this module
with the built-in forms (e.g., change List[str] → list[str], Tuple[dict, bool] →
tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str],
Tuple[np.ndarray, int] → tuple[np.ndarray, int]); keep ClassVar and Optional
from typing as-is and adjust any function/method signatures, return annotations,
and variable annotations that reference List/Tuple accordingly (look for usages
in functions like any top-level helpers, class attributes, and methods within
this file).
- Around line 211-220: The loop over prompt.__dataclass_fields__ has redundant
branches that all assign val directly; simplify by handling torch.Tensor
specially (detach().cpu()) and for all other types assign val to
prompt_dict[field_name] in a single else branch. Update the loop in the function
that builds prompt_dict (the code iterating field_name and using getattr(prompt,
field_name)) to only check isinstance(val, torch.Tensor) and otherwise set
prompt_dict[field_name] = val, removing the duplicate elif checks for list and
(int, float).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ccdeb5eb-4f63-47b6-90da-c038d837b31a

📥 Commits

Reviewing files that changed from the base of the PR and between 51fb320 and 4e7772a.

📒 Files selected for processing (13)

.github/workflows/release.yml
Dockerfile
app/src/components/Generation/EngineModelSelector.tsx
app/src/components/ServerSettings/ModelManagement.tsx
app/src/lib/api/types.ts
app/src/lib/constants/languages.ts
app/src/lib/hooks/useGenerationForm.ts
backend/backends/__init__.py
backend/backends/hume_backend.py
backend/build_binary.py
backend/models.py
backend/requirements.txt
justfile

app/src/lib/hooks/useGenerationForm.ts

backend/backends/hume_backend.py

The real descript-audio-codec package pulls in descript-audiotools, which transitively requires onnx, tensorboard, protobuf, matplotlib, pystoi, and other heavy dependencies. onnx fails to build from source on macOS due to CMake version incompatibility. TADA only uses Snake1d (a 7-line PyTorch module) from DAC. This commit adds a shim in backend/utils/dac_shim.py that registers fake dac.* modules in sys.modules with just the Snake1d class, completely eliminating the DAC/audiotools dependency chain.

TADA hardcodes 'meta-llama/Llama-3.2-1B' as its tokenizer source in both the Aligner and TadaForCausalLM.from_pretrained(). That repo is gated and requires accepting Meta's license on HuggingFace. Monkey-patch AutoTokenizer.from_pretrained during model loading to redirect Llama tokenizer requests to 'unsloth/Llama-3.2-1B', an ungated mirror with identical tokenizer files. The patch is scoped to model loading only and restored immediately after.

coderabbitai

🧹 Nitpick comments (2)

backend/backends/hume_backend.py (2)
57-57: Unused class variable _load_lock.

The _load_lock ClassVar is declared but never used in this class. The instance-level _model_load_lock (asyncio.Lock) at line 64 handles load serialization. Consider removing the unused threading.Lock.
🧹 Proposed removal
 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""

-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):
Also remove threading from imports at line 18 if no longer needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, The class declares an unused
class variable _load_lock: ClassVar[threading.Lock] = threading.Lock() which is
redundant because instance-level _model_load_lock (asyncio.Lock) already
serializes model loading; remove the _load_lock declaration and, if threading is
no longer used elsewhere in this module, also remove the threading import to
avoid an unused import. Ensure you only modify the
backend/backends/hume_backend.py definitions of _load_lock and the top-level
imports, leaving _model_load_lock and its usage intact.
19-19: Use modern type hints (list, tuple instead of typing.List, typing.Tuple).

Python 3.9+ supports built-in generics directly. This is a minor modernization.
🧹 Proposed changes
-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional
Then update usages:

Line 179: Tuple[dict, bool] → tuple[dict, bool]

Line 239: List[str] → list[str]

Line 241: Tuple[np.ndarray, str] → tuple[np.ndarray, str]

Line 251: Tuple[np.ndarray, int] → tuple[np.ndarray, int]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, Replace typing.List and
typing.Tuple usage with built-in generics: remove List and Tuple from the import
and keep ClassVar and Optional (e.g., change "from typing import ClassVar, List,
Optional, Tuple" to "from typing import ClassVar, Optional"). Then update the
annotated types where used: change Tuple[dict, bool] to tuple[dict, bool],
List[str] to list[str], Tuple[np.ndarray, str] to tuple[np.ndarray, str], and
Tuple[np.ndarray, int] to tuple[np.ndarray, int]; ensure any import of
List/Tuple is removed and the code uses the built-in generics consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: The class declares an unused class variable _load_lock:
ClassVar[threading.Lock] = threading.Lock() which is redundant because
instance-level _model_load_lock (asyncio.Lock) already serializes model loading;
remove the _load_lock declaration and, if threading is no longer used elsewhere
in this module, also remove the threading import to avoid an unused import.
Ensure you only modify the backend/backends/hume_backend.py definitions of
_load_lock and the top-level imports, leaving _model_load_lock and its usage
intact.
- Line 19: Replace typing.List and typing.Tuple usage with built-in generics:
remove List and Tuple from the import and keep ClassVar and Optional (e.g.,
change "from typing import ClassVar, List, Optional, Tuple" to "from typing
import ClassVar, Optional"). Then update the annotated types where used: change
Tuple[dict, bool] to tuple[dict, bool], List[str] to list[str],
Tuple[np.ndarray, str] to tuple[np.ndarray, str], and Tuple[np.ndarray, int] to
tuple[np.ndarray, int]; ensure any import of List/Tuple is removed and the code
uses the built-in generics consistently.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0be16f01-5e28-499d-b2c9-7057b45920d4

📥 Commits

Reviewing files that changed from the base of the PR and between 4e7772a and b02ce8e.

📒 Files selected for processing (4)

backend/backends/hume_backend.py
backend/build_binary.py
backend/requirements.txt
backend/utils/dac_shim.py

torchaudio 2.10+ switched its default audio loading backend to torchcodec, which isn't installed. Replace torchaudio.load() with soundfile.read() in create_voice_prompt(). TADA's internal use of torchaudio.functional.resample() is unaffected (pure PyTorch math, no torchcodec dependency).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

backend/backends/hume_backend.py (3)

57-57: Remove unused class variable _load_lock.

_load_lock (threading.Lock) is defined but never used in this class. The async coordination uses _model_load_lock (asyncio.Lock) instead.

🧹 Suggested fix

 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""

-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):

Also remove the unused threading import from line 18.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, Remove the unused class
variable _load_lock and its threading.Lock initialization from the HumeBackend
class and delete the now-unneeded threading import; verify that all async
coordination uses _model_load_lock (asyncio.Lock) so no other synchronization
references to _load_lock remain and run tests/lint to ensure no remaining
references to threading or _load_lock exist.

248-260: Simplify serialization logic with unified handling.

The branches at lines 254-259 all perform the same assignment. This can be simplified.

♻️ Suggested simplification

             # Serialize EncoderOutput to a dict of CPU tensors for caching
             prompt_dict = {}
             for field_name in prompt.__dataclass_fields__:
                 val = getattr(prompt, field_name)
                 if isinstance(val, torch.Tensor):
                     prompt_dict[field_name] = val.detach().cpu()
-                elif isinstance(val, list):
-                    prompt_dict[field_name] = val
-                elif isinstance(val, (int, float)):
-                    prompt_dict[field_name] = val
                 else:
+                    # Preserve lists, scalars, and other values as-is
                     prompt_dict[field_name] = val
             return prompt_dict

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 248 - 260, The serialization
loop for prompt fields is verbose because multiple branches all do the same
assignment; simplify by keeping only the special-case for torch.Tensor (call
val.detach().cpu()) and otherwise assign the value directly to
prompt_dict[field_name], using the existing loop over
prompt.__dataclass_fields__; update the code around prompt_dict creation so only
torch.Tensor is transformed and all other types are handled by a single
assignment to reduce redundancy.

19-19: Use built-in generics instead of deprecated typing imports.

typing.List and typing.Tuple are deprecated since Python 3.9. Use the lowercase built-in equivalents for consistency with modern Python.

♻️ Suggested fix

-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update the type hints:

List[str] → list[str]
Tuple[dict, bool] → tuple[dict, bool]
Tuple[np.ndarray, str] → tuple[np.ndarray, str]
Tuple[np.ndarray, int] → tuple[np.ndarray, int]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, Replace deprecated typing
generics by using built-in generics: remove List and Tuple from the typing
import on the top line (keep ClassVar and Optional) and change all type
annotations that use typing.List or typing.Tuple to the built-in forms;
specifically update occurrences like List[str] → list[str], Tuple[dict, bool] →
tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str], and
Tuple[np.ndarray, int] → tuple[np.ndarray, int] in the functions and classes
within hume_backend (look for annotations on methods and return types
referencing List or Tuple and update them accordingly).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/backends/hume_backend.py`:
- Around line 183-185: The restoration replaces the classmethod descriptor with
its raw function, breaking later AutoTokenizer.from_pretrained calls; preserve
the original descriptor (store the full classmethod object in
_orig_from_pretrained) and only use _orig_from_pretrained.__func__ when invoking
inside the monkeypatch, then restore AutoTokenizer.from_pretrained to the saved
descriptor in the finally block (referencing AutoTokenizer.from_pretrained and
_orig_from_pretrained in your change).

---

Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class variable _load_lock and its threading.Lock
initialization from the HumeBackend class and delete the now-unneeded threading
import; verify that all async coordination uses _model_load_lock (asyncio.Lock)
so no other synchronization references to _load_lock remain and run tests/lint
to ensure no remaining references to threading or _load_lock exist.
- Around line 248-260: The serialization loop for prompt fields is verbose
because multiple branches all do the same assignment; simplify by keeping only
the special-case for torch.Tensor (call val.detach().cpu()) and otherwise assign
the value directly to prompt_dict[field_name], using the existing loop over
prompt.__dataclass_fields__; update the code around prompt_dict creation so only
torch.Tensor is transformed and all other types are handled by a single
assignment to reduce redundancy.
- Line 19: Replace deprecated typing generics by using built-in generics: remove
List and Tuple from the typing import on the top line (keep ClassVar and
Optional) and change all type annotations that use typing.List or typing.Tuple
to the built-in forms; specifically update occurrences like List[str] →
list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] →
tuple[np.ndarray, str], and Tuple[np.ndarray, int] → tuple[np.ndarray, int] in
the functions and classes within hume_backend (look for annotations on methods
and return types referencing List or Tuple and update them accordingly).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53b741f1-a705-4b1f-a702-c2bebff4b7a6

📥 Commits

Reviewing files that changed from the base of the PR and between b02ce8e and 7a90290.

📒 Files selected for processing (1)

backend/backends/hume_backend.py

backend/backends/hume_backend.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

backend/backends/hume_backend.py (1)

155-156: ⚠️ Potential issue | 🔴 Critical

Restore AutoTokenizer.from_pretrained as a classmethod descriptor, not a raw function.

_orig_from_pretrained is captured as .__func__ (Line 155) and then restored directly (Line 185), which replaces the classmethod with a plain function and can break later calls.

🐛 Proposed fix

-            _orig_from_pretrained = AutoTokenizer.from_pretrained.__func__
+            _orig_from_pretrained = AutoTokenizer.from_pretrained

             `@classmethod`  # type: ignore[misc]
             def _patched_from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
                 if "meta-llama/Llama-3.2" in str(pretrained_model_name_or_path):
                     pretrained_model_name_or_path = "unsloth/Llama-3.2-1B"
                     kwargs.setdefault("token", None)
                     logger.info("Redirecting Llama tokenizer to ungated mirror: unsloth/Llama-3.2-1B")
-                return _orig_from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs)
+                return _orig_from_pretrained.__func__(cls, pretrained_model_name_or_path, *args, **kwargs)

             AutoTokenizer.from_pretrained = _patched_from_pretrained
             try:
                 ...
             finally:
                 # Restore original to avoid affecting other code
                 AutoTokenizer.from_pretrained = _orig_from_pretrained

#!/bin/bash
# Verify the risky capture/restore pattern still exists in the file
rg -n "from_pretrained\\.__func__|AutoTokenizer\\.from_pretrained = _orig_from_pretrained" backend/backends/hume_backend.py -C2

# Demonstrate why restoring a classmethod as raw function breaks class calls
python - <<'PY'
class Demo:
    `@classmethod`
    def f(cls, x):
        return cls.__name__, x

raw = Demo.f.__func__
Demo.f = raw  # mirrors the problematic restore pattern
try:
    Demo.f(1)
except TypeError as e:
    print("TypeError reproduced:", e)
PY

Also applies to: 165-166, 183-185

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 155 - 156, You captured
AutoTokenizer.from_pretrained as .__func__ and later assign it back as a plain
function, which strips the classmethod descriptor; instead capture the
descriptor itself (e.g., _orig_from_pretrained = AutoTokenizer.from_pretrained)
or when restoring wrap the raw function with classmethod (e.g.,
AutoTokenizer.from_pretrained = classmethod(_orig_from_pretrained_func)). Update
the save/restore pairs around _orig_from_pretrained (and the other similar
captures/restores at the mentioned locations) so the restored attribute is a
proper classmethod descriptor rather than a raw function.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/backends/hume_backend.py`:
- Around line 189-197: unload_model currently deletes self.model and
self.encoder while worker threads in _encode_sync() or _generate_sync() may be
using them, causing a race; add a shared synchronization mechanism such as an
integer active_ops counter plus a threading.Lock/Condition (or a single state
Lock) on the backend instance, increment active_ops at the start of _encode_sync
and _generate_sync and decrement it in a finally block, and have unload_model
acquire the lock and wait (or block) until active_ops == 0 before
deleting/setting model and encoder to None. Update the methods named
unload_model, _encode_sync, and _generate_sync to use this counter/lock pattern
so unload is serialized against in-flight encoding/generation (apply same fix to
other similar regions noted in the comment).

---

Duplicate comments:
In `@backend/backends/hume_backend.py`:
- Around line 155-156: You captured AutoTokenizer.from_pretrained as .__func__
and later assign it back as a plain function, which strips the classmethod
descriptor; instead capture the descriptor itself (e.g., _orig_from_pretrained =
AutoTokenizer.from_pretrained) or when restoring wrap the raw function with
classmethod (e.g., AutoTokenizer.from_pretrained =
classmethod(_orig_from_pretrained_func)). Update the save/restore pairs around
_orig_from_pretrained (and the other similar captures/restores at the mentioned
locations) so the restored attribute is a proper classmethod descriptor rather
than a raw function.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94241c81-4bda-472d-b3d0-8425085cbeef

📥 Commits

Reviewing files that changed from the base of the PR and between 7a90290 and 12cda2e.

📒 Files selected for processing (1)

backend/backends/hume_backend.py

backend/backends/hume_backend.py

Replace the monkey-patch on AutoTokenizer.from_pretrained (which broke the classmethod descriptor and caused 'Tokenizer not loaded' errors when loading Qwen after TADA) with two targeted config patches: - Set AlignerConfig.tokenizer_name to the local ungated tokenizer path - Pre-load TadaConfig, inject tokenizer_name, pass config= to from_pretrained No global state is modified; other engines are unaffected.

coderabbitai

🧹 Nitpick comments (4)

backend/backends/hume_backend.py (4)

57-57: Unused class-level lock.

_load_lock is defined as a class-level threading.Lock but is never used. The instance-level _model_load_lock (asyncio.Lock) handles load serialization. Either remove this dead code or clarify its intended purpose.

♻️ Suggested fix if unused

 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""

-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):

Also remove the unused threading import from line 18.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, Remove the unused class-level
threading Lock and its import: delete the class attribute _load_lock and remove
the threading import (the class already uses the instance-level asyncio.Lock
named _model_load_lock for serialization), or if a class-level lock was
intended, replace usages to use _load_lock consistently and ensure it's actually
used; reference the symbols _load_lock, _model_load_lock and the threading
import when making the change.

252-257: Simplify redundant branches.

These branches all perform the same assignment. Consolidate into a single else clause.

♻️ Suggested fix

             if isinstance(val, torch.Tensor):
                 prompt_dict[field_name] = val.detach().cpu()
-            elif isinstance(val, list):
-                prompt_dict[field_name] = val
-            elif isinstance(val, (int, float)):
-                prompt_dict[field_name] = val
             else:
                 prompt_dict[field_name] = val

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 252 - 257, The three redundant
isinstance branches that each do prompt_dict[field_name] = val should be
consolidated: remove the separate elif blocks for isinstance(val, list) and
isinstance(val, (int, float)) and replace them with a single else (or simply
perform prompt_dict[field_name] = val unconditionally) so that only distinct
handling remains and prompt_dict[field_name] is assigned from val in one place;
update the block containing prompt_dict, field_name, and val accordingly.

19-19: Use modern type hint syntax.

List and Tuple from typing are deprecated in Python 3.9+. Use built-in list and tuple instead.

♻️ Suggested fix

-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update usages throughout the file:

List[str] → list[str]
Tuple[dict, bool] → tuple[dict, bool]
Tuple[np.ndarray, str] → tuple[np.ndarray, str]
Tuple[np.ndarray, int] → tuple[np.ndarray, int]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, The imports and type
annotations in hume_backend.py use deprecated typing generics; replace List and
Tuple from typing with built-in generics (list, tuple) and update all
annotations accordingly: change the import line to remove List and Tuple, keep
ClassVar/Optional if needed, and update occurrences such as List[str] →
list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] →
tuple[np.ndarray, str], and Tuple[np.ndarray, int] → tuple[np.ndarray, int]
(ensure any other List/Tuple usages in functions or class attributes are
similarly converted).

309-319: Hoist model dtype lookup outside the loop.

next(self.model.parameters()).dtype is called for every floating-point tensor in the dictionary. Cache the dtype once before the loop.

♻️ Suggested fix

             # Reconstruct EncoderOutput from the cached dict
             restored = {}
+            model_dtype = next(self.model.parameters()).dtype
             for k, v in voice_prompt.items():
                 if isinstance(v, torch.Tensor):
                     # Move to device and match model dtype for float tensors
                     if v.is_floating_point():
-                        model_dtype = next(self.model.parameters()).dtype
                         restored[k] = v.to(device=device, dtype=model_dtype)
                     else:
                         restored[k] = v.to(device=device)
                 else:
                     restored[k] = v

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 309 - 319, The loop over
voice_prompt repeatedly calls next(self.model.parameters()).dtype for each
floating-point tensor; compute and cache the model dtype once before the loop
(e.g., model_dtype = next(self.model.parameters()).dtype) and then use that
cached model_dtype inside the for k, v in voice_prompt.items() loop when calling
v.to(device=device, dtype=model_dtype); leave non-floating tensors and
non-tensor values handling unchanged and continue assigning into restored.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class-level threading Lock and its import: delete
the class attribute _load_lock and remove the threading import (the class
already uses the instance-level asyncio.Lock named _model_load_lock for
serialization), or if a class-level lock was intended, replace usages to use
_load_lock consistently and ensure it's actually used; reference the symbols
_load_lock, _model_load_lock and the threading import when making the change.
- Around line 252-257: The three redundant isinstance branches that each do
prompt_dict[field_name] = val should be consolidated: remove the separate elif
blocks for isinstance(val, list) and isinstance(val, (int, float)) and replace
them with a single else (or simply perform prompt_dict[field_name] = val
unconditionally) so that only distinct handling remains and
prompt_dict[field_name] is assigned from val in one place; update the block
containing prompt_dict, field_name, and val accordingly.
- Line 19: The imports and type annotations in hume_backend.py use deprecated
typing generics; replace List and Tuple from typing with built-in generics
(list, tuple) and update all annotations accordingly: change the import line to
remove List and Tuple, keep ClassVar/Optional if needed, and update occurrences
such as List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool],
Tuple[np.ndarray, str] → tuple[np.ndarray, str], and Tuple[np.ndarray, int] →
tuple[np.ndarray, int] (ensure any other List/Tuple usages in functions or class
attributes are similarly converted).
- Around line 309-319: The loop over voice_prompt repeatedly calls
next(self.model.parameters()).dtype for each floating-point tensor; compute and
cache the model dtype once before the loop (e.g., model_dtype =
next(self.model.parameters()).dtype) and then use that cached model_dtype inside
the for k, v in voice_prompt.items() loop when calling v.to(device=device,
dtype=model_dtype); leave non-floating tensors and non-tensor values handling
unchanged and continue assigning into restored.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bcd01a2-43b5-487c-9101-68c5df2bb11b

📥 Commits

Reviewing files that changed from the base of the PR and between 12cda2e and 6bf40bd.

📒 Files selected for processing (1)

backend/backends/hume_backend.py

Remove @torch.jit.script from the DAC shim's snake() function — TorchScript calls inspect.getsource() which fails in PyInstaller binaries (no .py source files). Update all user-facing docs: 4 → 5 TTS engines, add TADA row to every engine comparison table, mark TADA as Shipped in the upcoming engines list, update architecture diagrams and tech stack tables.

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

app/src/lib/hooks/useGenerationForm.ts Show resolved Hide resolved

backend/backends/hume_backend.py Show resolved Hide resolved

backend/backends/hume_backend.py Outdated Show resolved Hide resolved

jamiepine added 2 commits March 17, 2026 02:16

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

backend/backends/hume_backend.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

backend/backends/hume_backend.py Show resolved Hide resolved

jamiepine added 2 commits March 17, 2026 03:15

fix TADA 3B model name: tada-3b -> tada-3b-ml

5774a16

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

jamiepine merged commit e789c93 into main Mar 17, 2026
1 check passed

coderabbitai bot mentioned this pull request Mar 17, 2026

feat: add CosyVoice2/3 TTS engine with instruct and voice cloning #311

Open

Conversation

jamiepine commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What is TADA?

Changes

Backend (4 files)

Frontend (5 files)

Dependencies & Infrastructure (4 files)

Testing

Note

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 17, 2026 •

edited

Loading