GPU usage spikes then drops to 0%, inconsistent generation speed, and missing delete option for failed generations

Environment

* Voicebox version: v0.2.3
* OS: Windows (Laptop)
* GPU: NVIDIA RTX 3050 Ti (4GB VRAM) + AMD Integrated GPU
* CUDA backend: Installed (voicebox-server-cuda.exe running)
* Models tested:

  * Qwen TTS 0.6B
  * Qwen TTS 1.7B
  * Chatterbox Turbo

---

## Issues

### 1. GPU usage spikes then drops to 0%

* When clicking **Generate**, GPU usage briefly spikes (sometimes 30–100%)
* Immediately after, it drops to **0%**, even while generation is still ongoing
* This happens consistently across models

#### Expected:

GPU should be utilized more consistently during inference

#### Actual:

GPU usage occurs only in short bursts and then falls back (possibly CPU/shared memory)

---

### 2. Inconsistent generation speed

* Sometimes generation is **fast (few seconds)**
* Other times it takes **minutes or seems stuck (“infinite generating”)**
* Same model and similar input length

#### Observations:

* VRAM is often near full (~4GB)
* Shared GPU memory is used
* Performance varies unpredictably

---

### 3. Model reload / repeated loading behavior

* Occasionally the model appears to **reload between generations**
* Causes long delays before generation starts

#### Expected:

Model should stay in memory for faster repeated generations

---

### 4. Failed generations cannot be deleted individually

* If a generation fails or gets stuck, there is **no option to delete it individually**
* The only workaround is deleting the entire **voice profile**, which removes all generations

#### Expected:

* Ability to delete **individual failed or unwanted generations**

---

### 5. GPU monitoring confusion (minor UX issue)

* Windows Task Manager shows **0% GPU usage**, even though CUDA backend is active
* This may confuse users into thinking GPU is not being used

---

## Additional Notes

* Models are stored on SSD (not HDD)
* Only one model is loaded at a time during testing
* CUDA backend is confirmed active (`voicebox-server-cuda.exe`)
* VRAM usage suggests GPU is partially utilized

---

## Summary

* GPU usage is inconsistent (spikes then idle)
* Generation speed is unstable
* Model loading behavior may not be persistent
* No way to delete failed generations individually

---

## Suggested Improvements

* More consistent GPU utilization (reduce CPU fallback / memory spill)
* Better handling of VRAM limits (especially for 4GB GPUs)
* Persistent model loading between generations
* Add delete option for individual generations
* Optional: clearer GPU usage indicator in UI

---

Thanks for the great project — looking forward to future updates 🚀


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU usage spikes then drops to 0%, inconsistent generation speed, and missing delete option for failed generations #324

Issues

1. GPU usage spikes then drops to 0%

Expected:

Actual:

2. Inconsistent generation speed

Observations:

3. Model reload / repeated loading behavior

Expected:

4. Failed generations cannot be deleted individually

Expected:

5. GPU monitoring confusion (minor UX issue)

Additional Notes

Summary

Suggested Improvements

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GPU usage spikes then drops to 0%, inconsistent generation speed, and missing delete option for failed generations #324

Description

Issues

1. GPU usage spikes then drops to 0%

Expected:

Actual:

2. Inconsistent generation speed

Observations:

3. Model reload / repeated loading behavior

Expected:

4. Failed generations cannot be deleted individually

Expected:

5. GPU monitoring confusion (minor UX issue)

Additional Notes

Summary

Suggested Improvements

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions