Fix generation freezes and silent-latent corruption for base model + 5Hz LM by Copilot · Pull Request #855 · ace-step/ACE-Step-1.5

Copilot · 2026-03-17T02:45:34Z

After updating, users running the base model with the 5Hz LM experienced audio corruption (random sounds overlaid, unpredictable intro) after 5–10 generations, and hard freezes during VAE decode on tight-VRAM systems.

Changes

Remove stale debug `print` statements

Removed print("Using precomputed LM hints") from prepare_condition() in all three model variants (base, sft, turbo) — left in from a recent feature addition.

Fix `silence_latent` aliasing across generations (`conditioning_target.py`)

Both silence_latent_tiled and the per-item target_latent for silent audio were returned as views of self.silence_latent. Any downstream in-place write silently corrupted the shared buffer, degrading generated audio over successive calls.

# Before — returns a view; in-place ops on caller side corrupt self.silence_latent
silence_latent_tiled = self.silence_latent[0, :max_latent_length, :]

# After — independent copy
silence_latent_tiled = self.silence_latent[0, :max_latent_length, :].clone()

Prevent VAE decode hang under tight VRAM (`vae_decode.py`, `generate_music_decode.py`)

Two complementary fixes:

Raised auto-CPU-VAE threshold (generate_music_decode.py) from 0.5 GB → 2.0 GB. With a 1.7B LM occupying ~8.4 GB of a 16 GB GPU, effective free VRAM hovers around 2–4 GB — the old threshold never fired.
Added proactive VRAM guard in tiled_decode(): when chunk_size hits its minimum (128 frames, selected when free VRAM < 12 GB) and remaining VRAM drops below 1.5 GB, the VAE is moved to CPU before tiling begins and restored afterward. This prevents CUDA deadlocks on Windows (WDDM) where the allocator stalls silently rather than raising OutOfMemoryError.

# New guard in tiled_decode()
if chunk_size <= self.VAE_DECODE_MAX_CHUNK_SIZE // 4 and not _is_mps:
    if _is_cuda_device(self.device):
        free_gb = get_effective_free_vram_gb(...)
        if free_gb < _TILED_DECODE_MIN_FREE_VRAM_GB:  # 1.5 GB
            # Move VAE + latents to CPU; restore after decode

Tests

New conditioning_target_test.py: verifies silence_latent_tiled and target_latents do not alias self.silence_latent.
Four new cases in vae_decode_mixin_test.py: VRAM guard fires when low, skips when sufficient, skips for large chunk sizes, VAE is restored after CPU decode.

Original prompt

This section details on the original issue you should resolve

<issue_title>Bugs after updating during generation</issue_title>
<issue_description>

Discussed in #850

^{Originally posted by tornado73 March 16, 2026}
Hello, yesterday I updated Ace Step and here's what I noticed.
I'm generating on Acestep-v15-base + Acestep-5Hz-lm-1.7B, with the Gradio interface. After 5-10 generations, it starts overlaying random sounds onto the composition. The intro is completely unpredictable and plays nonsense 99% of the time. I've tried disabling the intro in various ways, but it still keeps popping it everywhere. Sometimes reinitializing the models helps, but it doesn't last long. After 1-2 generations, it freezes dead at the stage. 2026-03-16 11:53:57.952 | WARNING | acestep.core.generation.handler.vae_decode_chunks:_tiled_decode_inner:35 - [tiled_decode] Reduced overlap from 64 to 32 for chunk_size=128, , This didn't happen in the previous version.
A clean install didn't help, it's still the same.
start data

Model: D:\ace\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-1.7B
Device: NVIDIA GeForce RTX 4070 Ti SUPER
GPU Memory Utilization: 0.527
Low GPU Memory Mode: False
5Hz LM initialized successfully
Service initialization completed successfully!
Creating Gradio interface with language: en...
Enabling queue for multi-user support...

p.s. Acestep-v15-sft works fine

Comments on the Issue (you are @copilot in this section)

Fixes Bugs after updating during generation #854

📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

Co-authored-by: ChuxiJ <30956809+ChuxiJ@users.noreply.github.com>

tornado73 · 2026-03-17T18:39:20Z

I'm testing a fix.
There are no freezes yet.

ChuxiJ

Code Review

Overall: Correct root cause fix for silence_latent aliasing. Well-targeted changes with proper cleanup logic.

Key Recommendations

Medium - Missing .clone() in padding path: conditioning_target.py line 91 still uses a bare view:
```
latent = torch.cat([latent, self.silence_latent[0, :pad_length, :]], dim=0)
```
While torch.cat allocates new memory, adding .clone() here would be consistent with the safety pattern applied elsewhere in this PR.
Medium - Silent exception swallowing (vae_decode.py):
```
except Exception:
    pass  # VRAM check is best-effort
```
This catches all exceptions including ImportError, AttributeError, etc. At minimum, use logger.debug() instead of pass to aid debugging.
Low - Magic number for VRAM threshold: The 2.0 GB threshold in generate_music_decode.py should be a named constant (like _TILED_DECODE_MIN_FREE_VRAM_GB elsewhere), with a comment explaining why 2.0 GB was chosen.
Low - Redundant test: test_silence_latent_tiled_is_not_a_view_of_silence_latent clones the returned tensor before mutating it, so it would pass even with aliased views. The data_ptr() test (test_silence_latent_tiled_data_does_not_alias_silence_latent) is the actually meaningful test.
Low - Inconsistent inline imports: memory_utils helpers are imported inline inside tiled_decode() while get_effective_free_vram_gb is at module level. A comment explaining why would help.

Good Practices

.clone() additions correctly address the root cause of progressive audio degradation
finally block ensures VAE device restoration even on exceptions
Debug print removal is clean

No blocking issues.

Initial plan

280f8b8

Copilot AI assigned Copilot and ChuxiJ Mar 17, 2026

Copilot started work on behalf of ChuxiJ March 17, 2026 02:45 View session

Fix bugs after updating during generation (base model + 5Hz LM)

73fdd72

Co-authored-by: ChuxiJ <30956809+ChuxiJ@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix bugs after updating during generation~~ Fix generation freezes and silent-latent corruption for base model + 5Hz LM Mar 17, 2026

Copilot AI requested a review from ChuxiJ March 17, 2026 03:37

Copilot finished work on behalf of ChuxiJ March 17, 2026 03:37

ChuxiJ reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix generation freezes and silent-latent corruption for base model + 5Hz LM#855

Fix generation freezes and silent-latent corruption for base model + 5Hz LM#855
Copilot wants to merge 2 commits intomainfrom
copilot/fix-bugs-after-update

Copilot AI commented Mar 17, 2026 •

edited

Loading

Uh oh!

tornado73 commented Mar 17, 2026

Uh oh!

ChuxiJ left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Remove stale debug print statements

Fix silence_latent aliasing across generations (conditioning_target.py)

Prevent VAE decode hang under tight VRAM (vae_decode.py, generate_music_decode.py)

Tests

Discussed in #850

Comments on the Issue (you are @copilot in this section)

Uh oh!

tornado73 commented Mar 17, 2026

Uh oh!

ChuxiJ left a comment

Choose a reason for hiding this comment

Code Review

Key Recommendations

Good Practices

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 17, 2026 •

edited

Loading

Remove stale debug `print` statements

Fix `silence_latent` aliasing across generations (`conditioning_target.py`)

Prevent VAE decode hang under tight VRAM (`vae_decode.py`, `generate_music_decode.py`)