Hi,
thanks for the amazing work you did with this method.
I am trying to save the quantized model to use it for further analysis but the model I save using the "save_qmodel_path" parameter is the same size as the original model since it is still saved in bf16.
I want to reproduce the inference speedup and the memory saving you mentioned for example in table 9 of the paper.
Could you please provide help how to save the real quantized model in INT4 ?