Skip to content

fix: force offline mode when loading cached models (Qwen TTS & Whisper)#318

Open
jamiepine wants to merge 1 commit intomainfrom
fix/offline-model-loading
Open

fix: force offline mode when loading cached models (Qwen TTS & Whisper)#318
jamiepine wants to merge 1 commit intomainfrom
fix/offline-model-loading

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 18, 2026

Summary

  • Qwen TTS and Whisper Base were making network calls to HuggingFace even when model weights were fully cached locally, making the app unusable offline
  • Adds a reusable force_offline_if_cached() context manager that sets HF_HUB_OFFLINE=1 during model loading when the cache is warm
  • Applied to all four affected backends: PyTorch TTS (Qwen), PyTorch STT (Whisper), MLX TTS, and MLX STT

Problem

from_pretrained() in both transformers and qwen_tts defaults to local_files_only=False, triggering multiple HTTP requests to HuggingFace Hub to check for model updates — even when the full model is already on disk. This causes timeouts or failures when offline.

Chatterbox and LuxTTS don't have this problem because they use snapshot_download()from_local(), which returns the cached path without network access.

The MLX TTS backend already had an inline HF_HUB_OFFLINE guard, but the PyTorch backends and MLX Whisper had none.

Changes

File Change
backend/utils/hf_offline_patch.py Added force_offline_if_cached() context manager
backend/backends/pytorch_backend.py Wrapped Qwen TTS and Whisper from_pretrained() calls with offline guard
backend/backends/mlx_backend.py Refactored MLX TTS to use shared context manager; added guard to MLX Whisper

Fixes #82

Summary by CodeRabbit

  • Refactor
    • Improved model caching and loading mechanisms for text-to-speech and speech-to-text features.
    • Enhanced offline mode support to ensure consistent performance when accessing previously downloaded models.
    • Simplified model loading infrastructure for better reliability and system stability.

Qwen TTS and Whisper Base make network calls to HuggingFace even when
model weights are fully cached locally, because from_pretrained()
defaults to local_files_only=False. This causes failures for offline
users.

Add a reusable force_offline_if_cached() context manager that sets
HF_HUB_OFFLINE=1 during model loading when is_model_cached() is True.
Applied to all four affected load paths:

- PyTorchTTSBackend (Qwen TTS)
- PyTorchSTTBackend (Whisper)
- MLXTTSBackend (refactored from inline implementation)
- MLXSTTBackend (previously unprotected)

Closes #82
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

These changes introduce a centralized force_offline_if_cached context manager to streamline offline model loading across backends. Manual environment variable manipulation and complex fallback logic in MLX and PyTorch backends are replaced with consistent wrapper calls around model loading operations.

Changes

Cohort / File(s) Summary
Offline Loading Utility
backend/utils/hf_offline_patch.py
New context manager force_offline_if_cached(is_cached, model_label) that conditionally forces HF_HUB_OFFLINE=1 during cached model loads, with exception handling and logging for offline-related errors.
MLX Backend Integration
backend/backends/mlx_backend.py
Replaced manual HF_HUB_OFFLINE environment toggling with force_offline_if_cached wrapper around TTS and Whisper model loading; removed try/finally fallback blocks and unused os import.
PyTorch Backend Integration
backend/backends/pytorch_backend.py
Applied force_offline_if_cached wrapper around Qwen3TTSModel.from_pretrained and Whisper processor/model loading calls for consistent offline behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A hop, a context, a twist of care,
No more env vars floating in the air!
Cached models load with offline grace,
One wrapper rules them all in place.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: force offline mode when loading cached models (Qwen TTS & Whisper)' directly summarizes the main change: implementing offline mode forcing for cached model loading across TTS and Whisper backends.
Linked Issues check ✅ Passed The PR addresses issue #82 by implementing offline mode forcing during cached model loading, preventing unnecessary network requests that were causing generation failures on Windows systems.
Out of Scope Changes check ✅ Passed All changes are directly related to the objective of fixing offline model loading: creating the context manager, applying it to TTS and Whisper backends in both PyTorch and MLX implementations, and removing old fallback logic.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/offline-model-loading
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/utils/hf_offline_patch.py`:
- Around line 16-53: The docstring for force_offline_if_cached incorrectly
claims it "automatically retry[s] with network access"; update the docstring to
accurately describe current behavior: when is_cached is True the context sets
HF_HUB_OFFLINE=1 and if an "offline" error occurs it logs a warning and
re-raises the exception (it does NOT perform a retry), and callers (e.g., code
in backend/services/generation.py) must catch the exception and perform any
retry with network access themselves; mention that the original HF_HUB_OFFLINE
env var is restored in the finally block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9733f26-c689-4e2f-9d95-dd269784a6ea

📥 Commits

Reviewing files that changed from the base of the PR and between ffc1b54 and 2e95b7c.

📒 Files selected for processing (3)
  • backend/backends/mlx_backend.py
  • backend/backends/pytorch_backend.py
  • backend/utils/hf_offline_patch.py

Comment on lines +16 to +53
@contextmanager
def force_offline_if_cached(is_cached: bool, model_label: str = ""):
"""Context manager that sets ``HF_HUB_OFFLINE=1`` while loading a cached model.

If *is_cached* is ``False`` the block runs normally (network allowed).
If the offline load raises an error containing "offline" we automatically
retry with network access so a partially-cached model still works.

Args:
is_cached: Whether the model weights are already on disk.
model_label: Human-readable name used in log messages.
"""
if not is_cached:
yield
return

original_value = os.environ.get("HF_HUB_OFFLINE")
os.environ["HF_HUB_OFFLINE"] = "1"
logger.info(
"[offline-guard] %s is cached — forcing HF_HUB_OFFLINE=1",
model_label or "model",
)

try:
yield
except Exception as exc:
if "offline" in str(exc).lower():
logger.warning(
"[offline-guard] Offline load failed for %s, retrying with network: %s",
model_label or "model",
exc,
)
# Restore original env and retry — caller must wrap the load
# inside force_offline_if_cached so retrying here isn't possible.
# Instead, propagate a flag via the exception so the caller can
# decide. For simplicity we just let it fall through to the
# finally block and re-raise.
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring overpromises retry behavior that isn't implemented.

The docstring at lines 21-22 states "we automatically retry with network access so a partially-cached model still works," but the actual implementation (lines 48-53) only logs a warning and re-raises the exception. The inline comment at lines 48-52 explicitly acknowledges that retrying within the context manager isn't possible.

Looking at backend/services/generation.py (lines 129-135 in context snippet), the caller catches exceptions and marks generation as "failed" without any retry attempt.

Consider updating the docstring to accurately describe the behavior:

📝 Suggested docstring fix
 `@contextmanager`
 def force_offline_if_cached(is_cached: bool, model_label: str = ""):
     """Context manager that sets ``HF_HUB_OFFLINE=1`` while loading a cached model.

     If *is_cached* is ``False`` the block runs normally (network allowed).
-    If the offline load raises an error containing "offline" we automatically
-    retry with network access so a partially-cached model still works.
+    If the offline load raises an error containing "offline", it is logged
+    and re-raised to allow the caller to implement retry logic if desired.

     Args:
         is_cached: Whether the model weights are already on disk.
         model_label: Human-readable name used in log messages.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/hf_offline_patch.py` around lines 16 - 53, The docstring for
force_offline_if_cached incorrectly claims it "automatically retry[s] with
network access"; update the docstring to accurately describe current behavior:
when is_cached is True the context sets HF_HUB_OFFLINE=1 and if an "offline"
error occurs it logs a warning and re-raises the exception (it does NOT perform
a retry), and callers (e.g., code in backend/services/generation.py) must catch
the exception and perform any retry with network access themselves; mention that
the original HF_HUB_OFFLINE env var is restored in the finally block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stopped working at all

1 participant