Skip to content

XTTS - Both VRAM and RAM reach 100% after many generations, audio stops playing despite console reporting success #661

@gugan555

Description

@gugan555

Describe the bug
After an uncertain amount TTS generations (certainly not the first nor second one), VRAM usage spikes to 100% and stays there, this also seems to be happening to the RAM:
When closing AllTalk's console, both the RAM and VRAM's usage percentages go back to normal.
Audio stops playing on the SillyTavern side (was using the standalone version, xtts), but the AllTalk console reports
the generation completed successfully.

I could not find any subdle fixes for the issue.

Steps to reproduce

  1. Start AllTalk standalone with XTTS model, then generate audios
  2. After a few generations (I believe the amount depends on hardware performance), VRAM hits 100% and audio stops being played.

Expected behavior
VRAM should be released after each generation and audio should play correctly, and unused audio stored in the RAM should be deleted or something (since it keeps getting clogged with old audios).

What I have tried

  • Low VRAM mode enabled: problem persists
  • PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True: not supported on this platform (PyTorch 2.2.1, Windows)

Environment

  • AllTalk version: 9th January 2026, branch: alltalkbeta
  • TTS Engine: XTTS xttsv2_2.0.3
  • PyTorch: 2.2.1
  • CUDA: 12.1
  • Python: 3.11.9
  • DeepSpeed: 0.14.0+ce78a63
  • OS: Windows
  • Running mode: Standalone

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions