-
Notifications
You must be signed in to change notification settings - Fork 0
Description
[FEATURE] Automate VLM Unloading for VRAM Management
User Story: As a user with a VRAM-limited GPU (<= 16GB), I want the application to automatically unload the Vision-Language Model (VLM) from memory after it has finished generating the caption and tags. This way, I have enough free VRAM to load my local Large Language Model (LLM) for the character card generation without needing to restart the application.
Problem: The current workflow for users with limited VRAM is clunky and inefficient. The VLM remains loaded in VRAM even when it's no longer needed, preventing the user from loading a local LLM (like via KoboldCPP) due to insufficient memory. The current workaround is to manually click the "Unload Model" button or restart the entire application, which disrupts the creative workflow.
Proposed Solution:
-
Implement an optional setting, likely a checkbox in the Settings or Caption tab, labeled something like:
"Automatically unload VLM after captioning to save VRAM". -
When this option is enabled, the application should automatically trigger the unload process once the caption and tags have been successfully generated and passed to the next tab.
-
Provide clear feedback in the status bar (e.g., "Caption generated. VLM Unloaded!") so the user knows what's happening.
Goal: To completely solve the "VRAM juggling act" for users on mainstream hardware. This will create a seamless, single-session workflow and dramatically improve the user experience.
Source: This feature was suggested by a user running a local KoboldCPP instance who identified this VRAM bottleneck.