Skip to content

Add HumeAI TADA TTS engine (1B English + 3B Multilingual)#296

Merged
jamiepine merged 7 commits intomainfrom
feat/add-tada-tts-engine
Mar 17, 2026
Merged

Add HumeAI TADA TTS engine (1B English + 3B Multilingual)#296
jamiepine merged 7 commits intomainfrom
feat/add-tada-tts-engine

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 17, 2026

Summary

  • Integrates HumeAI's TADA (Text-Acoustic Dual Alignment) speech-language model as a new TTS engine with two variants: tada-1b (English, ~4GB) and tada-3b-ml (10 languages, ~8GB)
  • Full end-to-end implementation across backend, frontend, dependencies, CI, Docker, and PyInstaller bundling
  • Uses --no-deps installation pattern (same as Chatterbox) due to torch>=2.7 pin conflict

What is TADA?

TADA is a speech-language model by HumeAI built on Llama 3.2. It uses a novel 1:1 text-acoustic token alignment with flow-matching diffusion, enabling:

  • 700s+ of coherent audio generation
  • Dynamic duration synthesis (no fixed frame rate)
  • High-fidelity voice cloning from short reference audio
  • bf16 inference support (~50% memory savings on CUDA)

Changes

Backend (4 files)

  • backend/backends/hume_backend.py (new) — TTSBackend implementation with Encoder-based voice prompt encoding, cached prompt serialization, bf16 CUDA / fp32 CPU support
  • backend/backends/__init__.py — Engine registry, ModelConfig entries, factory branch, multi-size model loading
  • backend/models.py — Extended engine and model_size validation regexes
  • backend/build_binary.py — PyInstaller hidden imports for tada, dac, torchaudio; --collect-all dac, --collect-submodules tada

Frontend (5 files)

  • app/src/lib/api/types.ts — Extended engine and model_size TypeScript unions
  • app/src/lib/constants/languages.ts — Added TADA language map (en, ar, zh, de, es, fr, it, ja, pl, pt)
  • app/src/components/Generation/EngineModelSelector.tsx — TADA 1B/3B options with tada:SIZE format handling
  • app/src/lib/hooks/useGenerationForm.ts — Zod schema, model name/display name mappings, model_size passing
  • app/src/components/ServerSettings/ModelManagement.tsx — Model descriptions and voice model filter

Dependencies & Infrastructure (4 files)

  • backend/requirements.txt — Added descript-audio-codec>=1.0.0 and torchaudio
  • justfile--no-deps hume-tada in Unix and Windows setup targets
  • .github/workflows/release.yml--no-deps hume-tada in CPU and CUDA build steps
  • Dockerfile--no-deps hume-tada install line

Testing

Needs local testing via just dev:

  • Model download (both tada-1b and tada-3b-ml)
  • Model loading on CPU
  • Generation produces valid 24kHz audio
  • Voice cloning from reference audio
  • Frozen binary build (just build)

Note

TADA models are built on Meta Llama 3.2, which requires accepting the Llama license on HuggingFace. Users will need a HuggingFace token with Llama access for the initial model download.

Summary by CodeRabbit

  • New Features

    • Added TADA text-to-speech (TADA 1B English, TADA 3B multilingual) with UI selection, language/model-size options, updated model listings, and full backend support for loading, caching, prompts, and generation.
    • Added a lightweight shim to emulate missing audio codec APIs when the full codec is absent.
  • Chores

    • Updated packaging, build, container, and CI steps to include TADA-related packages and audio runtime support (including torchaudio and hume-tada).

Integrates HumeAI's TADA (Text-Acoustic Dual Alignment) speech-language
model as a new TTS engine. TADA uses a novel 1:1 token-audio alignment
that produces coherent speech over long sequences (700s+).

Two model variants:
- tada-1b: English-only, ~4GB, built on Llama 3.2 1B
- tada-3b-ml: 10 languages, ~8GB, built on Llama 3.2 3B

Backend uses the Encoder for voice prompt encoding with caching, and
TadaForCausalLM with flow-matching diffusion for generation. Supports
bf16 inference on CUDA, forces CPU on macOS (MPS compatibility).

Installed with --no-deps due to torch>=2.7 pin conflict; descript-audio-codec
and torchaudio added as explicit sub-dependencies.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds first-class TADA TTS support across frontend, API types/schemas, and backend: a new HumeTadaBackend with model/codec download, caching, loading/unloading, generation and a DAC shim; CI/build updates to install and bundle hume-tada and related dependencies.

Changes

Cohort / File(s) Summary
CI / Build / Packaging
.github/workflows/release.yml, justfile, Dockerfile, backend/requirements.txt, backend/build_binary.py
Installs hume-tada (with --no-deps) in CI/just/Docker steps; adds torchaudio to backend requirements; expands PyInstaller hidden-imports / collect-submodules for tada and dac packaging.
Backend: TADA integration & shim
backend/backends/hume_backend.py, backend/backends/__init__.py, backend/utils/dac_shim.py, backend/models.py
Adds HumeTadaBackend implementing async-safe load/unload, per-size model switching, snapshot download & cache checks, device/dtype selection, prompt encoding/caching, and generate/create_voice_prompt; registers tada engine and tada-1b/tada-3b-ml configs; adds a DAC shim; expands model_size/engine validation to accept 1B/3B and tada.
Frontend: types, constants, hooks
app/src/lib/api/types.ts, app/src/lib/constants/languages.ts, app/src/lib/hooks/useGenerationForm.ts
Extends GenerationRequest to include engine tada and model sizes 1B/3B; adds tada supported languages; updates form schema, displayName, payload construction, and model-download logic to handle tada and size-specific constraints (1B English-only, 3B multilingual).
Frontend: UI components
app/src/components/Generation/EngineModelSelector.tsx, app/src/components/ServerSettings/ModelManagement.tsx
Adds TADA options (tada:1B, tada:3B) and descriptions to engine selector; treats tada-* models as voice-generation models in management views.
Small additions / glue
app/src/lib/api/types.ts (already grouped), other minor edits across repo
Minor control-flow, validation, and form reset adjustments to consistently integrate tada and modelSize across the stack.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/UI
    participant API as Backend API
    participant Router as Engine Router
    participant TADA as HumeTadaBackend
    participant Cache as Local Cache/FS
    participant HumeAI as HumeAI (snapshot_download)

    Client->>API: POST /generate {engine:"tada", model_size:"1B"|"3B", text, voice_prompt?}
    API->>Router: resolve TTS backend for "tada"
    Router->>TADA: ensure instantiated & loaded
    TADA->>Cache: check codec & model cached?
    alt not cached
        TADA->>HumeAI: snapshot_download(model & codec)
        HumeAI-->>Cache: model & codec files
    end
    TADA->>TADA: load_model(model_size) (async lock, device/dtype select)
    API->>TADA: generate(text, voice_prompt, language, seed?)
    TADA->>TADA: rehydrate encoder prompt, synthesize audio (24kHz)
    TADA-->>API: audio bytes + sample_rate
    API-->>Client: streamed/returned audio
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Poem

🐰 I hopped through types and backend lanes,

I cached the codec, downloaded the gains,
1B hums in English, 3B sings more,
Prompts and snapshots settled on the floor,
A rabbit cheers — new voices soar!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding HumeAI TADA TTS engine support with both 1B English and 3B Multilingual variants. It is concise, specific, and clearly conveys the primary feature being introduced across the entire changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add-tada-tts-engine
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/backends/__init__.py (2)

446-457: ⚠️ Potential issue | 🟠 Major

TADA needs model_size handling in get_model_load_func.

The TADA engine supports multiple model sizes (1B, 3B), but the fallback path at line 457 calls load_model() without the model_size argument. This will always load the default 1B model, even when the config specifies 3B.

🐛 Proposed fix to pass model_size for TADA
     if config.engine == "qwen":
         return lambda: tts.get_tts_model().load_model(config.model_size)
 
+    if config.engine == "tada":
+        return lambda: get_tts_backend_for_engine(config.engine).load_model(config.model_size)
+
     return lambda: get_tts_backend_for_engine(config.engine).load_model()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 446 - 457, get_model_load_func
currently omits the model_size when calling load_model for the generic TADA
fallback, so TADA always loads the default size; update the fallback lambda in
get_model_load_func to pass config.model_size to load_model (i.e., change the
final return to call
get_tts_backend_for_engine(config.engine).load_model(config.model_size)) so the
configured size (e.g., 3B) is respected.

425-443: ⚠️ Potential issue | 🟠 Major

TADA needs model_size check in check_model_loaded.

Similar to Qwen, TADA supports multiple model sizes, but this function only checks backend.is_loaded() without verifying the correct size is loaded. This could return True when tada-1b is loaded but the config is for tada-3b-ml.

🐛 Proposed fix to check model_size for TADA
         if config.engine == "qwen":
             tts_model = tts.get_tts_model()
             loaded_size = getattr(tts_model, "_current_model_size", None) or getattr(tts_model, "model_size", None)
             return tts_model.is_loaded() and loaded_size == config.model_size
 
+        if config.engine == "tada":
+            backend = get_tts_backend_for_engine(config.engine)
+            loaded_size = getattr(backend, "model_size", None)
+            return backend.is_loaded() and loaded_size == config.model_size
+
         backend = get_tts_backend_for_engine(config.engine)
         return backend.is_loaded()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 425 - 443, check_model_loaded
currently returns backend.is_loaded() for TADA without verifying the loaded
model size; update the TADA branch (where backend =
get_tts_backend_for_engine(config.engine) and later return backend.is_loaded())
to also check the model size similar to Qwen: obtain the loaded size via
getattr(backend, "_current_model_size", None) or getattr(backend, "model_size",
None) and return True only if backend.is_loaded() and the loaded size equals
config.model_size. Ensure you keep the existing whisper/qwen checks intact and
handle missing attributes gracefully.
🧹 Nitpick comments (4)
backend/backends/hume_backend.py (3)

57-57: Unused class-level lock.

_load_lock is declared as a ClassVar[threading.Lock] but is never used in the class. The actual locking is done via the instance-level _model_load_lock (asyncio.Lock) at line 64. Consider removing this dead code.

🧹 Remove unused lock
 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""
 
-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):

Also remove the threading import at line 18 if no longer needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, Remove the unused class-level
lock declaration `_load_lock: ClassVar[threading.Lock] = threading.Lock()` from
the class (it's dead code; locking is handled by the instance `_model_load_lock`
which is an `asyncio.Lock`), and delete the now-unnecessary `threading` import
from the top of the file so there are no unused imports remaining.

19-19: Consider using modern type hints.

typing.List and typing.Tuple are deprecated in favor of built-in list and tuple (PEP 585). This is a minor style improvement.

🧹 Use modern type hints
-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update usages:

  • List[str]list[str]
  • Tuple[dict, bool]tuple[dict, bool]
  • Tuple[np.ndarray, str]tuple[np.ndarray, str]
  • Tuple[np.ndarray, int]tuple[np.ndarray, int]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, Update the typing imports and
annotations to use PEP 585 built-in generics: remove List and Tuple from the
import line (currently "from typing import ClassVar, List, Optional, Tuple") and
replace their usages in this module with the built-in forms (e.g., change
List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray,
str] → tuple[np.ndarray, str], Tuple[np.ndarray, int] → tuple[np.ndarray, int]);
keep ClassVar and Optional from typing as-is and adjust any function/method
signatures, return annotations, and variable annotations that reference
List/Tuple accordingly (look for usages in functions like any top-level helpers,
class attributes, and methods within this file).

211-220: Consider simplifying redundant branches.

The static analyzer flagged that lines 215-220 have identical assignments. The elif branches for list, int/float, and the else clause all assign val directly. While the explicit type checks document intent, they could be combined.

🧹 Simplify serialization logic
             for field_name in prompt.__dataclass_fields__:
                 val = getattr(prompt, field_name)
                 if isinstance(val, torch.Tensor):
                     prompt_dict[field_name] = val.detach().cpu()
-                elif isinstance(val, list):
-                    prompt_dict[field_name] = val
-                elif isinstance(val, (int, float)):
-                    prompt_dict[field_name] = val
                 else:
+                    # Lists, scalars, and other values pass through unchanged
                     prompt_dict[field_name] = val
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 211 - 220, The loop over
prompt.__dataclass_fields__ has redundant branches that all assign val directly;
simplify by handling torch.Tensor specially (detach().cpu()) and for all other
types assign val to prompt_dict[field_name] in a single else branch. Update the
loop in the function that builds prompt_dict (the code iterating field_name and
using getattr(prompt, field_name)) to only check isinstance(val, torch.Tensor)
and otherwise set prompt_dict[field_name] = val, removing the duplicate elif
checks for list and (int, float).
backend/backends/__init__.py (1)

397-422: Consider adding model_size check in unload_model_by_config for TADA.

For consistency with the proposed fixes above, unload_model_by_config should also verify the loaded model size matches the config before unloading for the TADA engine. Currently, it would unload any loaded TADA model regardless of which size variant is loaded.

♻️ Proposed fix
         if config.engine == "qwen":
             tts_model = tts.get_tts_model()
             loaded_size = getattr(tts_model, "_current_model_size", None) or getattr(tts_model, "model_size", None)
             if tts_model.is_loaded() and loaded_size == config.model_size:
                 tts.unload_tts_model()
                 return True
             return False
 
+        if config.engine == "tada":
+            backend = get_tts_backend_for_engine(config.engine)
+            loaded_size = getattr(backend, "model_size", None)
+            if backend.is_loaded() and loaded_size == config.model_size:
+                backend.unload_model()
+                return True
+            return False
+
         # All other TTS engines
         backend = get_tts_backend_for_engine(config.engine)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 397 - 422, unload_model_by_config
currently unloads the TADA backend without verifying model size; update the
function to check the loaded model size matches config.model_size before calling
unload for the TADA engine. Inside unload_model_by_config, after obtaining
backend = get_tts_backend_for_engine(config.engine), add a branch for
config.engine == "tada" (or detect TADA backend) that reads the backend's loaded
size (e.g., getattr(backend, "_current_model_size", None) or getattr(backend,
"model_size", None)), verify backend.is_loaded() and loaded_size ==
config.model_size, and only then call backend.unload_model(); otherwise return
False. Ensure you reference unload_model_by_config, get_tts_backend_for_engine,
backend.is_loaded, and backend.unload_model when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/lib/hooks/useGenerationForm.ts`:
- Around line 82-84: The modelName generation for engine/variable model mapping
is incorrect for the TADA 3B variant; update the logic that computes modelName
(the expression using engine, data.modelSize) so that when engine === 'tada' and
data.modelSize (case-insensitive) equals '3b' it returns 'tada-3b-ml', otherwise
preserve the existing fallback (e.g., `tada-${...}` behavior for other sizes);
ensure you reference the same variables (engine, data.modelSize, and the
modelName assignment in useGenerationForm.ts) so the produced string matches the
backend's expected name.

In `@backend/backends/hume_backend.py`:
- Around line 200-207: Replace the direct torchaudio.load() usage with the
shared load_audio() helper to ensure consistent preprocessing (mono conversion)
before encoding: call load_audio(audio_path, sample_rate=sr, mono=True) to
obtain the audio tensor and use that when invoking self.encoder(audio,
text=text_arg, sample_rate=sr); also add/import load_audio from
backend.utils.audio. If you intentionally keep torchaudio.load(), explicitly
convert the tensor to mono before calling self.encoder and document that
decision in the code where audio, sr, audio_path, reference_text and
self.encoder are used.
- Around line 110-124: The download code uses snapshot_download with token=None
(in calls near snapshot_download for TADA_CODEC_REPO and repo while logging via
logger.info and referencing model_size), which relies on a cached hf token and
doesn't make it clear users must authenticate and accept the Llama license;
update the backend documentation (e.g., backend/README.md) to document that
users must run huggingface-cli login (or hf auth login) with a token that has
repository access and explicitly accept the Llama license for the TADA 1B and
3B-ML models before running the code so snapshot_download will succeed.

---

Outside diff comments:
In `@backend/backends/__init__.py`:
- Around line 446-457: get_model_load_func currently omits the model_size when
calling load_model for the generic TADA fallback, so TADA always loads the
default size; update the fallback lambda in get_model_load_func to pass
config.model_size to load_model (i.e., change the final return to call
get_tts_backend_for_engine(config.engine).load_model(config.model_size)) so the
configured size (e.g., 3B) is respected.
- Around line 425-443: check_model_loaded currently returns backend.is_loaded()
for TADA without verifying the loaded model size; update the TADA branch (where
backend = get_tts_backend_for_engine(config.engine) and later return
backend.is_loaded()) to also check the model size similar to Qwen: obtain the
loaded size via getattr(backend, "_current_model_size", None) or
getattr(backend, "model_size", None) and return True only if backend.is_loaded()
and the loaded size equals config.model_size. Ensure you keep the existing
whisper/qwen checks intact and handle missing attributes gracefully.

---

Nitpick comments:
In `@backend/backends/__init__.py`:
- Around line 397-422: unload_model_by_config currently unloads the TADA backend
without verifying model size; update the function to check the loaded model size
matches config.model_size before calling unload for the TADA engine. Inside
unload_model_by_config, after obtaining backend =
get_tts_backend_for_engine(config.engine), add a branch for config.engine ==
"tada" (or detect TADA backend) that reads the backend's loaded size (e.g.,
getattr(backend, "_current_model_size", None) or getattr(backend, "model_size",
None)), verify backend.is_loaded() and loaded_size == config.model_size, and
only then call backend.unload_model(); otherwise return False. Ensure you
reference unload_model_by_config, get_tts_backend_for_engine, backend.is_loaded,
and backend.unload_model when making the change.

In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class-level lock declaration `_load_lock:
ClassVar[threading.Lock] = threading.Lock()` from the class (it's dead code;
locking is handled by the instance `_model_load_lock` which is an
`asyncio.Lock`), and delete the now-unnecessary `threading` import from the top
of the file so there are no unused imports remaining.
- Line 19: Update the typing imports and annotations to use PEP 585 built-in
generics: remove List and Tuple from the import line (currently "from typing
import ClassVar, List, Optional, Tuple") and replace their usages in this module
with the built-in forms (e.g., change List[str] → list[str], Tuple[dict, bool] →
tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str],
Tuple[np.ndarray, int] → tuple[np.ndarray, int]); keep ClassVar and Optional
from typing as-is and adjust any function/method signatures, return annotations,
and variable annotations that reference List/Tuple accordingly (look for usages
in functions like any top-level helpers, class attributes, and methods within
this file).
- Around line 211-220: The loop over prompt.__dataclass_fields__ has redundant
branches that all assign val directly; simplify by handling torch.Tensor
specially (detach().cpu()) and for all other types assign val to
prompt_dict[field_name] in a single else branch. Update the loop in the function
that builds prompt_dict (the code iterating field_name and using getattr(prompt,
field_name)) to only check isinstance(val, torch.Tensor) and otherwise set
prompt_dict[field_name] = val, removing the duplicate elif checks for list and
(int, float).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ccdeb5eb-4f63-47b6-90da-c038d837b31a

📥 Commits

Reviewing files that changed from the base of the PR and between 51fb320 and 4e7772a.

📒 Files selected for processing (13)
  • .github/workflows/release.yml
  • Dockerfile
  • app/src/components/Generation/EngineModelSelector.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/lib/api/types.ts
  • app/src/lib/constants/languages.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • backend/backends/__init__.py
  • backend/backends/hume_backend.py
  • backend/build_binary.py
  • backend/models.py
  • backend/requirements.txt
  • justfile

The real descript-audio-codec package pulls in descript-audiotools,
which transitively requires onnx, tensorboard, protobuf, matplotlib,
pystoi, and other heavy dependencies. onnx fails to build from source
on macOS due to CMake version incompatibility.

TADA only uses Snake1d (a 7-line PyTorch module) from DAC. This commit
adds a shim in backend/utils/dac_shim.py that registers fake dac.*
modules in sys.modules with just the Snake1d class, completely
eliminating the DAC/audiotools dependency chain.
TADA hardcodes 'meta-llama/Llama-3.2-1B' as its tokenizer source in
both the Aligner and TadaForCausalLM.from_pretrained(). That repo is
gated and requires accepting Meta's license on HuggingFace.

Monkey-patch AutoTokenizer.from_pretrained during model loading to
redirect Llama tokenizer requests to 'unsloth/Llama-3.2-1B', an
ungated mirror with identical tokenizer files. The patch is scoped
to model loading only and restored immediately after.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/backends/hume_backend.py (2)

57-57: Unused class variable _load_lock.

The _load_lock ClassVar is declared but never used in this class. The instance-level _model_load_lock (asyncio.Lock) at line 64 handles load serialization. Consider removing the unused threading.Lock.

🧹 Proposed removal
 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""

-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):

Also remove threading from imports at line 18 if no longer needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, The class declares an unused
class variable _load_lock: ClassVar[threading.Lock] = threading.Lock() which is
redundant because instance-level _model_load_lock (asyncio.Lock) already
serializes model loading; remove the _load_lock declaration and, if threading is
no longer used elsewhere in this module, also remove the threading import to
avoid an unused import. Ensure you only modify the
backend/backends/hume_backend.py definitions of _load_lock and the top-level
imports, leaving _model_load_lock and its usage intact.

19-19: Use modern type hints (list, tuple instead of typing.List, typing.Tuple).

Python 3.9+ supports built-in generics directly. This is a minor modernization.

🧹 Proposed changes
-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update usages:

  • Line 179: Tuple[dict, bool]tuple[dict, bool]
  • Line 239: List[str]list[str]
  • Line 241: Tuple[np.ndarray, str]tuple[np.ndarray, str]
  • Line 251: Tuple[np.ndarray, int]tuple[np.ndarray, int]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, Replace typing.List and
typing.Tuple usage with built-in generics: remove List and Tuple from the import
and keep ClassVar and Optional (e.g., change "from typing import ClassVar, List,
Optional, Tuple" to "from typing import ClassVar, Optional"). Then update the
annotated types where used: change Tuple[dict, bool] to tuple[dict, bool],
List[str] to list[str], Tuple[np.ndarray, str] to tuple[np.ndarray, str], and
Tuple[np.ndarray, int] to tuple[np.ndarray, int]; ensure any import of
List/Tuple is removed and the code uses the built-in generics consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: The class declares an unused class variable _load_lock:
ClassVar[threading.Lock] = threading.Lock() which is redundant because
instance-level _model_load_lock (asyncio.Lock) already serializes model loading;
remove the _load_lock declaration and, if threading is no longer used elsewhere
in this module, also remove the threading import to avoid an unused import.
Ensure you only modify the backend/backends/hume_backend.py definitions of
_load_lock and the top-level imports, leaving _model_load_lock and its usage
intact.
- Line 19: Replace typing.List and typing.Tuple usage with built-in generics:
remove List and Tuple from the import and keep ClassVar and Optional (e.g.,
change "from typing import ClassVar, List, Optional, Tuple" to "from typing
import ClassVar, Optional"). Then update the annotated types where used: change
Tuple[dict, bool] to tuple[dict, bool], List[str] to list[str],
Tuple[np.ndarray, str] to tuple[np.ndarray, str], and Tuple[np.ndarray, int] to
tuple[np.ndarray, int]; ensure any import of List/Tuple is removed and the code
uses the built-in generics consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0be16f01-5e28-499d-b2c9-7057b45920d4

📥 Commits

Reviewing files that changed from the base of the PR and between 4e7772a and b02ce8e.

📒 Files selected for processing (4)
  • backend/backends/hume_backend.py
  • backend/build_binary.py
  • backend/requirements.txt
  • backend/utils/dac_shim.py

torchaudio 2.10+ switched its default audio loading backend to
torchcodec, which isn't installed. Replace torchaudio.load() with
soundfile.read() in create_voice_prompt(). TADA's internal use of
torchaudio.functional.resample() is unaffected (pure PyTorch math,
no torchcodec dependency).
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
backend/backends/hume_backend.py (3)

57-57: Remove unused class variable _load_lock.

_load_lock (threading.Lock) is defined but never used in this class. The async coordination uses _model_load_lock (asyncio.Lock) instead.

🧹 Suggested fix
 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""

-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):

Also remove the unused threading import from line 18.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, Remove the unused class
variable _load_lock and its threading.Lock initialization from the HumeBackend
class and delete the now-unneeded threading import; verify that all async
coordination uses _model_load_lock (asyncio.Lock) so no other synchronization
references to _load_lock remain and run tests/lint to ensure no remaining
references to threading or _load_lock exist.

248-260: Simplify serialization logic with unified handling.

The branches at lines 254-259 all perform the same assignment. This can be simplified.

♻️ Suggested simplification
             # Serialize EncoderOutput to a dict of CPU tensors for caching
             prompt_dict = {}
             for field_name in prompt.__dataclass_fields__:
                 val = getattr(prompt, field_name)
                 if isinstance(val, torch.Tensor):
                     prompt_dict[field_name] = val.detach().cpu()
-                elif isinstance(val, list):
-                    prompt_dict[field_name] = val
-                elif isinstance(val, (int, float)):
-                    prompt_dict[field_name] = val
                 else:
+                    # Preserve lists, scalars, and other values as-is
                     prompt_dict[field_name] = val
             return prompt_dict
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 248 - 260, The serialization
loop for prompt fields is verbose because multiple branches all do the same
assignment; simplify by keeping only the special-case for torch.Tensor (call
val.detach().cpu()) and otherwise assign the value directly to
prompt_dict[field_name], using the existing loop over
prompt.__dataclass_fields__; update the code around prompt_dict creation so only
torch.Tensor is transformed and all other types are handled by a single
assignment to reduce redundancy.

19-19: Use built-in generics instead of deprecated typing imports.

typing.List and typing.Tuple are deprecated since Python 3.9. Use the lowercase built-in equivalents for consistency with modern Python.

♻️ Suggested fix
-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update the type hints:

  • List[str]list[str]
  • Tuple[dict, bool]tuple[dict, bool]
  • Tuple[np.ndarray, str]tuple[np.ndarray, str]
  • Tuple[np.ndarray, int]tuple[np.ndarray, int]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, Replace deprecated typing
generics by using built-in generics: remove List and Tuple from the typing
import on the top line (keep ClassVar and Optional) and change all type
annotations that use typing.List or typing.Tuple to the built-in forms;
specifically update occurrences like List[str] → list[str], Tuple[dict, bool] →
tuple[dict, bool], Tuple[np.ndarray, str] → tuple[np.ndarray, str], and
Tuple[np.ndarray, int] → tuple[np.ndarray, int] in the functions and classes
within hume_backend (look for annotations on methods and return types
referencing List or Tuple and update them accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/backends/hume_backend.py`:
- Around line 183-185: The restoration replaces the classmethod descriptor with
its raw function, breaking later AutoTokenizer.from_pretrained calls; preserve
the original descriptor (store the full classmethod object in
_orig_from_pretrained) and only use _orig_from_pretrained.__func__ when invoking
inside the monkeypatch, then restore AutoTokenizer.from_pretrained to the saved
descriptor in the finally block (referencing AutoTokenizer.from_pretrained and
_orig_from_pretrained in your change).

---

Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class variable _load_lock and its threading.Lock
initialization from the HumeBackend class and delete the now-unneeded threading
import; verify that all async coordination uses _model_load_lock (asyncio.Lock)
so no other synchronization references to _load_lock remain and run tests/lint
to ensure no remaining references to threading or _load_lock exist.
- Around line 248-260: The serialization loop for prompt fields is verbose
because multiple branches all do the same assignment; simplify by keeping only
the special-case for torch.Tensor (call val.detach().cpu()) and otherwise assign
the value directly to prompt_dict[field_name], using the existing loop over
prompt.__dataclass_fields__; update the code around prompt_dict creation so only
torch.Tensor is transformed and all other types are handled by a single
assignment to reduce redundancy.
- Line 19: Replace deprecated typing generics by using built-in generics: remove
List and Tuple from the typing import on the top line (keep ClassVar and
Optional) and change all type annotations that use typing.List or typing.Tuple
to the built-in forms; specifically update occurrences like List[str] →
list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] →
tuple[np.ndarray, str], and Tuple[np.ndarray, int] → tuple[np.ndarray, int] in
the functions and classes within hume_backend (look for annotations on methods
and return types referencing List or Tuple and update them accordingly).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53b741f1-a705-4b1f-a702-c2bebff4b7a6

📥 Commits

Reviewing files that changed from the base of the PR and between b02ce8e and 7a90290.

📒 Files selected for processing (1)
  • backend/backends/hume_backend.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/backends/hume_backend.py (1)

155-156: ⚠️ Potential issue | 🔴 Critical

Restore AutoTokenizer.from_pretrained as a classmethod descriptor, not a raw function.

_orig_from_pretrained is captured as .__func__ (Line 155) and then restored directly (Line 185), which replaces the classmethod with a plain function and can break later calls.

🐛 Proposed fix
-            _orig_from_pretrained = AutoTokenizer.from_pretrained.__func__
+            _orig_from_pretrained = AutoTokenizer.from_pretrained

             `@classmethod`  # type: ignore[misc]
             def _patched_from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
                 if "meta-llama/Llama-3.2" in str(pretrained_model_name_or_path):
                     pretrained_model_name_or_path = "unsloth/Llama-3.2-1B"
                     kwargs.setdefault("token", None)
                     logger.info("Redirecting Llama tokenizer to ungated mirror: unsloth/Llama-3.2-1B")
-                return _orig_from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs)
+                return _orig_from_pretrained.__func__(cls, pretrained_model_name_or_path, *args, **kwargs)

             AutoTokenizer.from_pretrained = _patched_from_pretrained
             try:
                 ...
             finally:
                 # Restore original to avoid affecting other code
                 AutoTokenizer.from_pretrained = _orig_from_pretrained
#!/bin/bash
# Verify the risky capture/restore pattern still exists in the file
rg -n "from_pretrained\\.__func__|AutoTokenizer\\.from_pretrained = _orig_from_pretrained" backend/backends/hume_backend.py -C2

# Demonstrate why restoring a classmethod as raw function breaks class calls
python - <<'PY'
class Demo:
    `@classmethod`
    def f(cls, x):
        return cls.__name__, x

raw = Demo.f.__func__
Demo.f = raw  # mirrors the problematic restore pattern
try:
    Demo.f(1)
except TypeError as e:
    print("TypeError reproduced:", e)
PY

Also applies to: 165-166, 183-185

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 155 - 156, You captured
AutoTokenizer.from_pretrained as .__func__ and later assign it back as a plain
function, which strips the classmethod descriptor; instead capture the
descriptor itself (e.g., _orig_from_pretrained = AutoTokenizer.from_pretrained)
or when restoring wrap the raw function with classmethod (e.g.,
AutoTokenizer.from_pretrained = classmethod(_orig_from_pretrained_func)). Update
the save/restore pairs around _orig_from_pretrained (and the other similar
captures/restores at the mentioned locations) so the restored attribute is a
proper classmethod descriptor rather than a raw function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/backends/hume_backend.py`:
- Around line 189-197: unload_model currently deletes self.model and
self.encoder while worker threads in _encode_sync() or _generate_sync() may be
using them, causing a race; add a shared synchronization mechanism such as an
integer active_ops counter plus a threading.Lock/Condition (or a single state
Lock) on the backend instance, increment active_ops at the start of _encode_sync
and _generate_sync and decrement it in a finally block, and have unload_model
acquire the lock and wait (or block) until active_ops == 0 before
deleting/setting model and encoder to None. Update the methods named
unload_model, _encode_sync, and _generate_sync to use this counter/lock pattern
so unload is serialized against in-flight encoding/generation (apply same fix to
other similar regions noted in the comment).

---

Duplicate comments:
In `@backend/backends/hume_backend.py`:
- Around line 155-156: You captured AutoTokenizer.from_pretrained as .__func__
and later assign it back as a plain function, which strips the classmethod
descriptor; instead capture the descriptor itself (e.g., _orig_from_pretrained =
AutoTokenizer.from_pretrained) or when restoring wrap the raw function with
classmethod (e.g., AutoTokenizer.from_pretrained =
classmethod(_orig_from_pretrained_func)). Update the save/restore pairs around
_orig_from_pretrained (and the other similar captures/restores at the mentioned
locations) so the restored attribute is a proper classmethod descriptor rather
than a raw function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94241c81-4bda-472d-b3d0-8425085cbeef

📥 Commits

Reviewing files that changed from the base of the PR and between 7a90290 and 12cda2e.

📒 Files selected for processing (1)
  • backend/backends/hume_backend.py

Replace the monkey-patch on AutoTokenizer.from_pretrained (which broke
the classmethod descriptor and caused 'Tokenizer not loaded' errors
when loading Qwen after TADA) with two targeted config patches:
- Set AlignerConfig.tokenizer_name to the local ungated tokenizer path
- Pre-load TadaConfig, inject tokenizer_name, pass config= to from_pretrained

No global state is modified; other engines are unaffected.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
backend/backends/hume_backend.py (4)

57-57: Unused class-level lock.

_load_lock is defined as a class-level threading.Lock but is never used. The instance-level _model_load_lock (asyncio.Lock) handles load serialization. Either remove this dead code or clarify its intended purpose.

♻️ Suggested fix if unused
 class HumeTadaBackend:
     """HumeAI TADA TTS backend for high-quality voice cloning."""

-    _load_lock: ClassVar[threading.Lock] = threading.Lock()
-
     def __init__(self):

Also remove the unused threading import from line 18.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 57, Remove the unused class-level
threading Lock and its import: delete the class attribute _load_lock and remove
the threading import (the class already uses the instance-level asyncio.Lock
named _model_load_lock for serialization), or if a class-level lock was
intended, replace usages to use _load_lock consistently and ensure it's actually
used; reference the symbols _load_lock, _model_load_lock and the threading
import when making the change.

252-257: Simplify redundant branches.

These branches all perform the same assignment. Consolidate into a single else clause.

♻️ Suggested fix
             if isinstance(val, torch.Tensor):
                 prompt_dict[field_name] = val.detach().cpu()
-            elif isinstance(val, list):
-                prompt_dict[field_name] = val
-            elif isinstance(val, (int, float)):
-                prompt_dict[field_name] = val
             else:
                 prompt_dict[field_name] = val
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 252 - 257, The three redundant
isinstance branches that each do prompt_dict[field_name] = val should be
consolidated: remove the separate elif blocks for isinstance(val, list) and
isinstance(val, (int, float)) and replace them with a single else (or simply
perform prompt_dict[field_name] = val unconditionally) so that only distinct
handling remains and prompt_dict[field_name] is assigned from val in one place;
update the block containing prompt_dict, field_name, and val accordingly.

19-19: Use modern type hint syntax.

List and Tuple from typing are deprecated in Python 3.9+. Use built-in list and tuple instead.

♻️ Suggested fix
-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update usages throughout the file:

  • List[str]list[str]
  • Tuple[dict, bool]tuple[dict, bool]
  • Tuple[np.ndarray, str]tuple[np.ndarray, str]
  • Tuple[np.ndarray, int]tuple[np.ndarray, int]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` at line 19, The imports and type
annotations in hume_backend.py use deprecated typing generics; replace List and
Tuple from typing with built-in generics (list, tuple) and update all
annotations accordingly: change the import line to remove List and Tuple, keep
ClassVar/Optional if needed, and update occurrences such as List[str] →
list[str], Tuple[dict, bool] → tuple[dict, bool], Tuple[np.ndarray, str] →
tuple[np.ndarray, str], and Tuple[np.ndarray, int] → tuple[np.ndarray, int]
(ensure any other List/Tuple usages in functions or class attributes are
similarly converted).

309-319: Hoist model dtype lookup outside the loop.

next(self.model.parameters()).dtype is called for every floating-point tensor in the dictionary. Cache the dtype once before the loop.

♻️ Suggested fix
             # Reconstruct EncoderOutput from the cached dict
             restored = {}
+            model_dtype = next(self.model.parameters()).dtype
             for k, v in voice_prompt.items():
                 if isinstance(v, torch.Tensor):
                     # Move to device and match model dtype for float tensors
                     if v.is_floating_point():
-                        model_dtype = next(self.model.parameters()).dtype
                         restored[k] = v.to(device=device, dtype=model_dtype)
                     else:
                         restored[k] = v.to(device=device)
                 else:
                     restored[k] = v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/hume_backend.py` around lines 309 - 319, The loop over
voice_prompt repeatedly calls next(self.model.parameters()).dtype for each
floating-point tensor; compute and cache the model dtype once before the loop
(e.g., model_dtype = next(self.model.parameters()).dtype) and then use that
cached model_dtype inside the for k, v in voice_prompt.items() loop when calling
v.to(device=device, dtype=model_dtype); leave non-floating tensors and
non-tensor values handling unchanged and continue assigning into restored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/backends/hume_backend.py`:
- Line 57: Remove the unused class-level threading Lock and its import: delete
the class attribute _load_lock and remove the threading import (the class
already uses the instance-level asyncio.Lock named _model_load_lock for
serialization), or if a class-level lock was intended, replace usages to use
_load_lock consistently and ensure it's actually used; reference the symbols
_load_lock, _model_load_lock and the threading import when making the change.
- Around line 252-257: The three redundant isinstance branches that each do
prompt_dict[field_name] = val should be consolidated: remove the separate elif
blocks for isinstance(val, list) and isinstance(val, (int, float)) and replace
them with a single else (or simply perform prompt_dict[field_name] = val
unconditionally) so that only distinct handling remains and
prompt_dict[field_name] is assigned from val in one place; update the block
containing prompt_dict, field_name, and val accordingly.
- Line 19: The imports and type annotations in hume_backend.py use deprecated
typing generics; replace List and Tuple from typing with built-in generics
(list, tuple) and update all annotations accordingly: change the import line to
remove List and Tuple, keep ClassVar/Optional if needed, and update occurrences
such as List[str] → list[str], Tuple[dict, bool] → tuple[dict, bool],
Tuple[np.ndarray, str] → tuple[np.ndarray, str], and Tuple[np.ndarray, int] →
tuple[np.ndarray, int] (ensure any other List/Tuple usages in functions or class
attributes are similarly converted).
- Around line 309-319: The loop over voice_prompt repeatedly calls
next(self.model.parameters()).dtype for each floating-point tensor; compute and
cache the model dtype once before the loop (e.g., model_dtype =
next(self.model.parameters()).dtype) and then use that cached model_dtype inside
the for k, v in voice_prompt.items() loop when calling v.to(device=device,
dtype=model_dtype); leave non-floating tensors and non-tensor values handling
unchanged and continue assigning into restored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bcd01a2-43b5-487c-9101-68c5df2bb11b

📥 Commits

Reviewing files that changed from the base of the PR and between 12cda2e and 6bf40bd.

📒 Files selected for processing (1)
  • backend/backends/hume_backend.py

Remove @torch.jit.script from the DAC shim's snake() function —
TorchScript calls inspect.getsource() which fails in PyInstaller
binaries (no .py source files).

Update all user-facing docs: 4 → 5 TTS engines, add TADA row to
every engine comparison table, mark TADA as Shipped in the upcoming
engines list, update architecture diagrams and tech stack tables.
@jamiepine jamiepine merged commit e789c93 into main Mar 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant