Skip to content

Fix generation freezes and silent-latent corruption for base model + 5Hz LM#855

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/fix-bugs-after-update
Draft

Fix generation freezes and silent-latent corruption for base model + 5Hz LM#855
Copilot wants to merge 2 commits intomainfrom
copilot/fix-bugs-after-update

Conversation

Copy link
Contributor

Copilot AI commented Mar 17, 2026

After updating, users running the base model with the 5Hz LM experienced audio corruption (random sounds overlaid, unpredictable intro) after 5–10 generations, and hard freezes during VAE decode on tight-VRAM systems.

Changes

Remove stale debug print statements

  • Removed print("Using precomputed LM hints") from prepare_condition() in all three model variants (base, sft, turbo) — left in from a recent feature addition.

Fix silence_latent aliasing across generations (conditioning_target.py)

Both silence_latent_tiled and the per-item target_latent for silent audio were returned as views of self.silence_latent. Any downstream in-place write silently corrupted the shared buffer, degrading generated audio over successive calls.

# Before — returns a view; in-place ops on caller side corrupt self.silence_latent
silence_latent_tiled = self.silence_latent[0, :max_latent_length, :]

# After — independent copy
silence_latent_tiled = self.silence_latent[0, :max_latent_length, :].clone()

Prevent VAE decode hang under tight VRAM (vae_decode.py, generate_music_decode.py)

Two complementary fixes:

  1. Raised auto-CPU-VAE threshold (generate_music_decode.py) from 0.5 GB → 2.0 GB. With a 1.7B LM occupying ~8.4 GB of a 16 GB GPU, effective free VRAM hovers around 2–4 GB — the old threshold never fired.

  2. Added proactive VRAM guard in tiled_decode(): when chunk_size hits its minimum (128 frames, selected when free VRAM < 12 GB) and remaining VRAM drops below 1.5 GB, the VAE is moved to CPU before tiling begins and restored afterward. This prevents CUDA deadlocks on Windows (WDDM) where the allocator stalls silently rather than raising OutOfMemoryError.

# New guard in tiled_decode()
if chunk_size <= self.VAE_DECODE_MAX_CHUNK_SIZE // 4 and not _is_mps:
    if _is_cuda_device(self.device):
        free_gb = get_effective_free_vram_gb(...)
        if free_gb < _TILED_DECODE_MIN_FREE_VRAM_GB:  # 1.5 GB
            # Move VAE + latents to CPU; restore after decode

Tests

  • New conditioning_target_test.py: verifies silence_latent_tiled and target_latents do not alias self.silence_latent.
  • Four new cases in vae_decode_mixin_test.py: VRAM guard fires when low, skips when sufficient, skips for large chunk sizes, VAE is restored after CPU decode.
Original prompt

This section details on the original issue you should resolve

<issue_title>Bugs after updating during generation</issue_title>
<issue_description>

Discussed in #850

Originally posted by tornado73 March 16, 2026
Hello, yesterday I updated Ace Step and here's what I noticed.
I'm generating on Acestep-v15-base + Acestep-5Hz-lm-1.7B, with the Gradio interface. After 5-10 generations, it starts overlaying random sounds onto the composition. The intro is completely unpredictable and plays nonsense 99% of the time. I've tried disabling the intro in various ways, but it still keeps popping it everywhere. Sometimes reinitializing the models helps, but it doesn't last long. After 1-2 generations, it freezes dead at the stage. 2026-03-16 11:53:57.952 | WARNING | acestep.core.generation.handler.vae_decode_chunks:_tiled_decode_inner:35 - [tiled_decode] Reduced overlap from 64 to 32 for chunk_size=128, , This didn't happen in the previous version.
A clean install didn't help, it's still the same.
start data

Model: D:\ace\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-1.7B
Device: NVIDIA GeForce RTX 4070 Ti SUPER
GPU Memory Utilization: 0.527
Low GPU Memory Mode: False
5Hz LM initialized successfully
Service initialization completed successfully!
Creating Gradio interface with language: en...
Enabling queue for multi-user support...

p.s. Acestep-v15-sft works fine

Comments on the Issue (you are @copilot in this section)


📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

Co-authored-by: ChuxiJ <30956809+ChuxiJ@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix bugs after updating during generation Fix generation freezes and silent-latent corruption for base model + 5Hz LM Mar 17, 2026
Copilot AI requested a review from ChuxiJ March 17, 2026 03:37
@tornado73
Copy link

I'm testing a fix.
There are no freezes yet.

Copy link
Contributor

@ChuxiJ ChuxiJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Overall: Correct root cause fix for silence_latent aliasing. Well-targeted changes with proper cleanup logic.

Key Recommendations

  1. Medium - Missing .clone() in padding path: conditioning_target.py line 91 still uses a bare view:

    latent = torch.cat([latent, self.silence_latent[0, :pad_length, :]], dim=0)

    While torch.cat allocates new memory, adding .clone() here would be consistent with the safety pattern applied elsewhere in this PR.

  2. Medium - Silent exception swallowing (vae_decode.py):

    except Exception:
        pass  # VRAM check is best-effort

    This catches all exceptions including ImportError, AttributeError, etc. At minimum, use logger.debug() instead of pass to aid debugging.

  3. Low - Magic number for VRAM threshold: The 2.0 GB threshold in generate_music_decode.py should be a named constant (like _TILED_DECODE_MIN_FREE_VRAM_GB elsewhere), with a comment explaining why 2.0 GB was chosen.

  4. Low - Redundant test: test_silence_latent_tiled_is_not_a_view_of_silence_latent clones the returned tensor before mutating it, so it would pass even with aliased views. The data_ptr() test (test_silence_latent_tiled_data_does_not_alias_silence_latent) is the actually meaningful test.

  5. Low - Inconsistent inline imports: memory_utils helpers are imported inline inside tiled_decode() while get_effective_free_vram_gb is at module level. A comment explaining why would help.

Good Practices

  • .clone() additions correctly address the root cause of progressive audio degradation
  • finally block ensures VAE device restoration even on exceptions
  • Debug print removal is clean

No blocking issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bugs after updating during generation

3 participants