Please modify the LLMWrapper classes (local.py) to support quantized versions of models. For testing, check if Llama-405B and DeepSeek-V3/R1 are runnable using FP8 or INT4