Skip to content

GPU usage spikes then drops to 0%, inconsistent generation speed, and missing delete option for failed generations #324

@hariOneb

Description

@hariOneb

Environment

  • Voicebox version: v0.2.3

  • OS: Windows (Laptop)

  • GPU: NVIDIA RTX 3050 Ti (4GB VRAM) + AMD Integrated GPU

  • CUDA backend: Installed (voicebox-server-cuda.exe running)

  • Models tested:

    • Qwen TTS 0.6B
    • Qwen TTS 1.7B
    • Chatterbox Turbo

Issues

1. GPU usage spikes then drops to 0%

  • When clicking Generate, GPU usage briefly spikes (sometimes 30–100%)
  • Immediately after, it drops to 0%, even while generation is still ongoing
  • This happens consistently across models

Expected:

GPU should be utilized more consistently during inference

Actual:

GPU usage occurs only in short bursts and then falls back (possibly CPU/shared memory)


2. Inconsistent generation speed

  • Sometimes generation is fast (few seconds)
  • Other times it takes minutes or seems stuck (“infinite generating”)
  • Same model and similar input length

Observations:

  • VRAM is often near full (~4GB)
  • Shared GPU memory is used
  • Performance varies unpredictably

3. Model reload / repeated loading behavior

  • Occasionally the model appears to reload between generations
  • Causes long delays before generation starts

Expected:

Model should stay in memory for faster repeated generations


4. Failed generations cannot be deleted individually

  • If a generation fails or gets stuck, there is no option to delete it individually
  • The only workaround is deleting the entire voice profile, which removes all generations

Expected:

  • Ability to delete individual failed or unwanted generations

5. GPU monitoring confusion (minor UX issue)

  • Windows Task Manager shows 0% GPU usage, even though CUDA backend is active
  • This may confuse users into thinking GPU is not being used

Additional Notes

  • Models are stored on SSD (not HDD)
  • Only one model is loaded at a time during testing
  • CUDA backend is confirmed active (voicebox-server-cuda.exe)
  • VRAM usage suggests GPU is partially utilized

Summary

  • GPU usage is inconsistent (spikes then idle)
  • Generation speed is unstable
  • Model loading behavior may not be persistent
  • No way to delete failed generations individually

Suggested Improvements

  • More consistent GPU utilization (reduce CPU fallback / memory spill)
  • Better handling of VRAM limits (especially for 4GB GPUs)
  • Persistent model loading between generations
  • Add delete option for individual generations
  • Optional: clearer GPU usage indicator in UI

Thanks for the great project — looking forward to future updates 🚀

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions