Why is the model loaded twice? I'm curious. Once in 8 bit: https://github.com/sumo43/loopvlm/blob/e15de9bbbcc3eb4019e56b701dc7fc8669564f89/generate.py#L302 Once in bfloat16: https://github.com/sumo43/loopvlm/blob/e15de9bbbcc3eb4019e56b701dc7fc8669564f89/generate.py#L321