-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Hello! Is there any way to predict or know how much VRAM will need? I can see that it varies based on length of audio or video drivers, resolution and fps, also and the most notably, the mode you chose to run the inference. My use case is, I have rtx3090 and I'm trying to run inference with a image and audio of 8segs duration, 25fps and image size 512, 40 steps, only mode 0 is avoiding OOM.
Also, as of now, with my tests, the model is generating the video with the face blurry and malformed, it is a shame because everything else looks promising and I would really like this framework working well, any ideas? I can share more info if needed.
Also, on the inference,py, I added if conditions for when you chose mode 0 to not to try VASA as it throws error as it is not detecting face... normal.
Than you!