fix: force offline mode when loading cached models (Qwen TTS & Whisper) by jamiepine · Pull Request #318 · jamiepine/voicebox

jamiepine · 2026-03-18T17:31:48Z

Summary

Qwen TTS and Whisper Base were making network calls to HuggingFace even when model weights were fully cached locally, making the app unusable offline
Adds a reusable force_offline_if_cached() context manager that sets HF_HUB_OFFLINE=1 during model loading when the cache is warm
Applied to all four affected backends: PyTorch TTS (Qwen), PyTorch STT (Whisper), MLX TTS, and MLX STT

Problem

from_pretrained() in both transformers and qwen_tts defaults to local_files_only=False, triggering multiple HTTP requests to HuggingFace Hub to check for model updates — even when the full model is already on disk. This causes timeouts or failures when offline.

Chatterbox and LuxTTS don't have this problem because they use snapshot_download() → from_local(), which returns the cached path without network access.

The MLX TTS backend already had an inline HF_HUB_OFFLINE guard, but the PyTorch backends and MLX Whisper had none.

Changes

File	Change
`backend/utils/hf_offline_patch.py`	Added `force_offline_if_cached()` context manager
`backend/backends/pytorch_backend.py`	Wrapped Qwen TTS and Whisper `from_pretrained()` calls with offline guard
`backend/backends/mlx_backend.py`	Refactored MLX TTS to use shared context manager; added guard to MLX Whisper

Fixes #82

Summary by CodeRabbit

Refactor
- Improved model caching and loading mechanisms for text-to-speech and speech-to-text features.
- Enhanced offline mode support to ensure consistent performance when accessing previously downloaded models.
- Simplified model loading infrastructure for better reliability and system stability.

Qwen TTS and Whisper Base make network calls to HuggingFace even when model weights are fully cached locally, because from_pretrained() defaults to local_files_only=False. This causes failures for offline users. Add a reusable force_offline_if_cached() context manager that sets HF_HUB_OFFLINE=1 during model loading when is_model_cached() is True. Applied to all four affected load paths: - PyTorchTTSBackend (Qwen TTS) - PyTorchSTTBackend (Whisper) - MLXTTSBackend (refactored from inline implementation) - MLXSTTBackend (previously unprotected) Closes #82

coderabbitai · 2026-03-18T17:32:12Z

📝 Walkthrough

Walkthrough

These changes introduce a centralized force_offline_if_cached context manager to streamline offline model loading across backends. Manual environment variable manipulation and complex fallback logic in MLX and PyTorch backends are replaced with consistent wrapper calls around model loading operations.

Changes

Cohort / File(s)	Summary
Offline Loading Utility `backend/utils/hf_offline_patch.py`	New context manager `force_offline_if_cached(is_cached, model_label)` that conditionally forces `HF_HUB_OFFLINE=1` during cached model loads, with exception handling and logging for offline-related errors.
MLX Backend Integration `backend/backends/mlx_backend.py`	Replaced manual `HF_HUB_OFFLINE` environment toggling with `force_offline_if_cached` wrapper around TTS and Whisper model loading; removed try/finally fallback blocks and unused `os` import.
PyTorch Backend Integration `backend/backends/pytorch_backend.py`	Applied `force_offline_if_cached` wrapper around `Qwen3TTSModel.from_pretrained` and Whisper processor/model loading calls for consistent offline behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A hop, a context, a twist of care,
No more env vars floating in the air!
Cached models load with offline grace,
One wrapper rules them all in place. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: force offline mode when loading cached models (Qwen TTS & Whisper)' directly summarizes the main change: implementing offline mode forcing for cached model loading across TTS and Whisper backends.
Linked Issues check	✅ Passed	The PR addresses issue `#82` by implementing offline mode forcing during cached model loading, preventing unnecessary network requests that were causing generation failures on Windows systems.
Out of Scope Changes check	✅ Passed	All changes are directly related to the objective of fixing offline model loading: creating the context manager, applying it to TTS and Whisper backends in both PyTorch and MLX implementations, and removing old fallback logic.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/offline-model-loading

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/utils/hf_offline_patch.py`:
- Around line 16-53: The docstring for force_offline_if_cached incorrectly
claims it "automatically retry[s] with network access"; update the docstring to
accurately describe current behavior: when is_cached is True the context sets
HF_HUB_OFFLINE=1 and if an "offline" error occurs it logs a warning and
re-raises the exception (it does NOT perform a retry), and callers (e.g., code
in backend/services/generation.py) must catch the exception and perform any
retry with network access themselves; mention that the original HF_HUB_OFFLINE
env var is restored in the finally block.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9733f26-c689-4e2f-9d95-dd269784a6ea

📥 Commits

Reviewing files that changed from the base of the PR and between ffc1b54 and 2e95b7c.

📒 Files selected for processing (3)

backend/backends/mlx_backend.py
backend/backends/pytorch_backend.py
backend/utils/hf_offline_patch.py

coderabbitai · 2026-03-18T17:38:28Z

backend/utils/hf_offline_patch.py

+@contextmanager
+def force_offline_if_cached(is_cached: bool, model_label: str = ""):
+    """Context manager that sets ``HF_HUB_OFFLINE=1`` while loading a cached model.
+
+    If *is_cached* is ``False`` the block runs normally (network allowed).
+    If the offline load raises an error containing "offline" we automatically
+    retry with network access so a partially-cached model still works.
+
+    Args:
+        is_cached: Whether the model weights are already on disk.
+        model_label: Human-readable name used in log messages.
+    """
+    if not is_cached:
+        yield
+        return
+
+    original_value = os.environ.get("HF_HUB_OFFLINE")
+    os.environ["HF_HUB_OFFLINE"] = "1"
+    logger.info(
+        "[offline-guard] %s is cached — forcing HF_HUB_OFFLINE=1",
+        model_label or "model",
+    )
+
+    try:
+        yield
+    except Exception as exc:
+        if "offline" in str(exc).lower():
+            logger.warning(
+                "[offline-guard] Offline load failed for %s, retrying with network: %s",
+                model_label or "model",
+                exc,
+            )
+            # Restore original env and retry — caller must wrap the load
+            # inside force_offline_if_cached so retrying here isn't possible.
+            # Instead, propagate a flag via the exception so the caller can
+            # decide.  For simplicity we just let it fall through to the
+            # finally block and re-raise.
+            raise


⚠️ Potential issue | 🟡 Minor

Docstring overpromises retry behavior that isn't implemented.

The docstring at lines 21-22 states "we automatically retry with network access so a partially-cached model still works," but the actual implementation (lines 48-53) only logs a warning and re-raises the exception. The inline comment at lines 48-52 explicitly acknowledges that retrying within the context manager isn't possible.

Looking at backend/services/generation.py (lines 129-135 in context snippet), the caller catches exceptions and marks generation as "failed" without any retry attempt.

Consider updating the docstring to accurately describe the behavior:

📝 Suggested docstring fix

`@contextmanager` def force_offline_if_cached(is_cached: bool, model_label: str = ""): """Context manager that sets ``HF_HUB_OFFLINE=1`` while loading a cached model. If *is_cached* is ``False`` the block runs normally (network allowed). - If the offline load raises an error containing "offline" we automatically - retry with network access so a partially-cached model still works. + If the offline load raises an error containing "offline", it is logged + and re-raised to allow the caller to implement retry logic if desired. Args: is_cached: Whether the model weights are already on disk. model_label: Human-readable name used in log messages. """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/utils/hf_offline_patch.py` around lines 16 - 53, The docstring for force_offline_if_cached incorrectly claims it "automatically retry[s] with network access"; update the docstring to accurately describe current behavior: when is_cached is True the context sets HF_HUB_OFFLINE=1 and if an "offline" error occurs it logs a warning and re-raises the exception (it does NOT perform a retry), and callers (e.g., code in backend/services/generation.py) must catch the exception and perform any retry with network access themselves; mention that the original HF_HUB_OFFLINE env var is restored in the finally block.

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: force offline mode when loading cached models (Qwen TTS & Whisper)#318

fix: force offline mode when loading cached models (Qwen TTS & Whisper)#318
jamiepine wants to merge 1 commit intomainfrom
fix/offline-model-loading

jamiepine commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamiepine commented Mar 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 18, 2026 •

edited

Loading