fix: force offline mode when loading cached models (Qwen TTS & Whisper)#318
fix: force offline mode when loading cached models (Qwen TTS & Whisper)#318
Conversation
Qwen TTS and Whisper Base make network calls to HuggingFace even when model weights are fully cached locally, because from_pretrained() defaults to local_files_only=False. This causes failures for offline users. Add a reusable force_offline_if_cached() context manager that sets HF_HUB_OFFLINE=1 during model loading when is_model_cached() is True. Applied to all four affected load paths: - PyTorchTTSBackend (Qwen TTS) - PyTorchSTTBackend (Whisper) - MLXTTSBackend (refactored from inline implementation) - MLXSTTBackend (previously unprotected) Closes #82
📝 WalkthroughWalkthroughThese changes introduce a centralized Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/utils/hf_offline_patch.py`:
- Around line 16-53: The docstring for force_offline_if_cached incorrectly
claims it "automatically retry[s] with network access"; update the docstring to
accurately describe current behavior: when is_cached is True the context sets
HF_HUB_OFFLINE=1 and if an "offline" error occurs it logs a warning and
re-raises the exception (it does NOT perform a retry), and callers (e.g., code
in backend/services/generation.py) must catch the exception and perform any
retry with network access themselves; mention that the original HF_HUB_OFFLINE
env var is restored in the finally block.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c9733f26-c689-4e2f-9d95-dd269784a6ea
📒 Files selected for processing (3)
backend/backends/mlx_backend.pybackend/backends/pytorch_backend.pybackend/utils/hf_offline_patch.py
| @contextmanager | ||
| def force_offline_if_cached(is_cached: bool, model_label: str = ""): | ||
| """Context manager that sets ``HF_HUB_OFFLINE=1`` while loading a cached model. | ||
|
|
||
| If *is_cached* is ``False`` the block runs normally (network allowed). | ||
| If the offline load raises an error containing "offline" we automatically | ||
| retry with network access so a partially-cached model still works. | ||
|
|
||
| Args: | ||
| is_cached: Whether the model weights are already on disk. | ||
| model_label: Human-readable name used in log messages. | ||
| """ | ||
| if not is_cached: | ||
| yield | ||
| return | ||
|
|
||
| original_value = os.environ.get("HF_HUB_OFFLINE") | ||
| os.environ["HF_HUB_OFFLINE"] = "1" | ||
| logger.info( | ||
| "[offline-guard] %s is cached — forcing HF_HUB_OFFLINE=1", | ||
| model_label or "model", | ||
| ) | ||
|
|
||
| try: | ||
| yield | ||
| except Exception as exc: | ||
| if "offline" in str(exc).lower(): | ||
| logger.warning( | ||
| "[offline-guard] Offline load failed for %s, retrying with network: %s", | ||
| model_label or "model", | ||
| exc, | ||
| ) | ||
| # Restore original env and retry — caller must wrap the load | ||
| # inside force_offline_if_cached so retrying here isn't possible. | ||
| # Instead, propagate a flag via the exception so the caller can | ||
| # decide. For simplicity we just let it fall through to the | ||
| # finally block and re-raise. | ||
| raise |
There was a problem hiding this comment.
Docstring overpromises retry behavior that isn't implemented.
The docstring at lines 21-22 states "we automatically retry with network access so a partially-cached model still works," but the actual implementation (lines 48-53) only logs a warning and re-raises the exception. The inline comment at lines 48-52 explicitly acknowledges that retrying within the context manager isn't possible.
Looking at backend/services/generation.py (lines 129-135 in context snippet), the caller catches exceptions and marks generation as "failed" without any retry attempt.
Consider updating the docstring to accurately describe the behavior:
📝 Suggested docstring fix
`@contextmanager`
def force_offline_if_cached(is_cached: bool, model_label: str = ""):
"""Context manager that sets ``HF_HUB_OFFLINE=1`` while loading a cached model.
If *is_cached* is ``False`` the block runs normally (network allowed).
- If the offline load raises an error containing "offline" we automatically
- retry with network access so a partially-cached model still works.
+ If the offline load raises an error containing "offline", it is logged
+ and re-raised to allow the caller to implement retry logic if desired.
Args:
is_cached: Whether the model weights are already on disk.
model_label: Human-readable name used in log messages.
"""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/utils/hf_offline_patch.py` around lines 16 - 53, The docstring for
force_offline_if_cached incorrectly claims it "automatically retry[s] with
network access"; update the docstring to accurately describe current behavior:
when is_cached is True the context sets HF_HUB_OFFLINE=1 and if an "offline"
error occurs it logs a warning and re-raises the exception (it does NOT perform
a retry), and callers (e.g., code in backend/services/generation.py) must catch
the exception and perform any retry with network access themselves; mention that
the original HF_HUB_OFFLINE env var is restored in the finally block.
Summary
force_offline_if_cached()context manager that setsHF_HUB_OFFLINE=1during model loading when the cache is warmProblem
from_pretrained()in bothtransformersandqwen_ttsdefaults tolocal_files_only=False, triggering multiple HTTP requests to HuggingFace Hub to check for model updates — even when the full model is already on disk. This causes timeouts or failures when offline.Chatterbox and LuxTTS don't have this problem because they use
snapshot_download()→from_local(), which returns the cached path without network access.The MLX TTS backend already had an inline
HF_HUB_OFFLINEguard, but the PyTorch backends and MLX Whisper had none.Changes
backend/utils/hf_offline_patch.pyforce_offline_if_cached()context managerbackend/backends/pytorch_backend.pyfrom_pretrained()calls with offline guardbackend/backends/mlx_backend.pyFixes #82
Summary by CodeRabbit