You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 4, 2026. It is now read-only.
Would it be possible to unload the model from VRAM after a certain time?
For testing and VRAM contraints, when using multiple services, that would be really helpfull.
Kinda like the "keep_alive" in ollama.
"0" to unload instantly after the request,
"-1" for never undload
"5m" for 5 minutes after the last request.
Hi,
thx for the hard work!
Would it be possible to unload the model from VRAM after a certain time?
For testing and VRAM contraints, when using multiple services, that would be really helpfull.
Kinda like the "keep_alive" in ollama.
"0" to unload instantly after the request,
"-1" for never undload
"5m" for 5 minutes after the last request.
thank you