Fix generation freezes and silent-latent corruption for base model + 5Hz LM#855
Fix generation freezes and silent-latent corruption for base model + 5Hz LM#855
Conversation
Co-authored-by: ChuxiJ <30956809+ChuxiJ@users.noreply.github.com>
|
I'm testing a fix. |
ChuxiJ
left a comment
There was a problem hiding this comment.
Code Review
Overall: Correct root cause fix for silence_latent aliasing. Well-targeted changes with proper cleanup logic.
Key Recommendations
-
Medium - Missing
.clone()in padding path:conditioning_target.pyline 91 still uses a bare view:latent = torch.cat([latent, self.silence_latent[0, :pad_length, :]], dim=0)
While
torch.catallocates new memory, adding.clone()here would be consistent with the safety pattern applied elsewhere in this PR. -
Medium - Silent exception swallowing (
vae_decode.py):except Exception: pass # VRAM check is best-effort
This catches all exceptions including
ImportError,AttributeError, etc. At minimum, uselogger.debug()instead ofpassto aid debugging. -
Low - Magic number for VRAM threshold: The
2.0GB threshold ingenerate_music_decode.pyshould be a named constant (like_TILED_DECODE_MIN_FREE_VRAM_GBelsewhere), with a comment explaining why 2.0 GB was chosen. -
Low - Redundant test:
test_silence_latent_tiled_is_not_a_view_of_silence_latentclones the returned tensor before mutating it, so it would pass even with aliased views. Thedata_ptr()test (test_silence_latent_tiled_data_does_not_alias_silence_latent) is the actually meaningful test. -
Low - Inconsistent inline imports:
memory_utilshelpers are imported inline insidetiled_decode()whileget_effective_free_vram_gbis at module level. A comment explaining why would help.
Good Practices
.clone()additions correctly address the root cause of progressive audio degradationfinallyblock ensures VAE device restoration even on exceptions- Debug print removal is clean
No blocking issues.
After updating, users running the base model with the 5Hz LM experienced audio corruption (random sounds overlaid, unpredictable intro) after 5–10 generations, and hard freezes during VAE decode on tight-VRAM systems.
Changes
Remove stale debug
printstatementsprint("Using precomputed LM hints")fromprepare_condition()in all three model variants (base,sft,turbo) — left in from a recent feature addition.Fix
silence_latentaliasing across generations (conditioning_target.py)Both
silence_latent_tiledand the per-itemtarget_latentfor silent audio were returned as views ofself.silence_latent. Any downstream in-place write silently corrupted the shared buffer, degrading generated audio over successive calls.Prevent VAE decode hang under tight VRAM (
vae_decode.py,generate_music_decode.py)Two complementary fixes:
Raised auto-CPU-VAE threshold (
generate_music_decode.py) from 0.5 GB → 2.0 GB. With a 1.7B LM occupying ~8.4 GB of a 16 GB GPU, effective free VRAM hovers around 2–4 GB — the old threshold never fired.Added proactive VRAM guard in
tiled_decode(): whenchunk_sizehits its minimum (128 frames, selected when free VRAM < 12 GB) and remaining VRAM drops below 1.5 GB, the VAE is moved to CPU before tiling begins and restored afterward. This prevents CUDA deadlocks on Windows (WDDM) where the allocator stalls silently rather than raisingOutOfMemoryError.Tests
conditioning_target_test.py: verifiessilence_latent_tiledandtarget_latentsdo not aliasself.silence_latent.vae_decode_mixin_test.py: VRAM guard fires when low, skips when sufficient, skips for large chunk sizes, VAE is restored after CPU decode.Original prompt
📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.