-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
Environment
-
Voicebox version: v0.2.3
-
OS: Windows (Laptop)
-
GPU: NVIDIA RTX 3050 Ti (4GB VRAM) + AMD Integrated GPU
-
CUDA backend: Installed (voicebox-server-cuda.exe running)
-
Models tested:
- Qwen TTS 0.6B
- Qwen TTS 1.7B
- Chatterbox Turbo
Issues
1. GPU usage spikes then drops to 0%
- When clicking Generate, GPU usage briefly spikes (sometimes 30–100%)
- Immediately after, it drops to 0%, even while generation is still ongoing
- This happens consistently across models
Expected:
GPU should be utilized more consistently during inference
Actual:
GPU usage occurs only in short bursts and then falls back (possibly CPU/shared memory)
2. Inconsistent generation speed
- Sometimes generation is fast (few seconds)
- Other times it takes minutes or seems stuck (“infinite generating”)
- Same model and similar input length
Observations:
- VRAM is often near full (~4GB)
- Shared GPU memory is used
- Performance varies unpredictably
3. Model reload / repeated loading behavior
- Occasionally the model appears to reload between generations
- Causes long delays before generation starts
Expected:
Model should stay in memory for faster repeated generations
4. Failed generations cannot be deleted individually
- If a generation fails or gets stuck, there is no option to delete it individually
- The only workaround is deleting the entire voice profile, which removes all generations
Expected:
- Ability to delete individual failed or unwanted generations
5. GPU monitoring confusion (minor UX issue)
- Windows Task Manager shows 0% GPU usage, even though CUDA backend is active
- This may confuse users into thinking GPU is not being used
Additional Notes
- Models are stored on SSD (not HDD)
- Only one model is loaded at a time during testing
- CUDA backend is confirmed active (
voicebox-server-cuda.exe) - VRAM usage suggests GPU is partially utilized
Summary
- GPU usage is inconsistent (spikes then idle)
- Generation speed is unstable
- Model loading behavior may not be persistent
- No way to delete failed generations individually
Suggested Improvements
- More consistent GPU utilization (reduce CPU fallback / memory spill)
- Better handling of VRAM limits (especially for 4GB GPUs)
- Persistent model loading between generations
- Add delete option for individual generations
- Optional: clearer GPU usage indicator in UI
Thanks for the great project — looking forward to future updates 🚀
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels