Skip to content

How to save real quantized model to reproduce inference speedup and memory saving? #8

@LauraKrone24

Description

@LauraKrone24

Hi,
thanks for the amazing work you did with this method.

I am trying to save the quantized model to use it for further analysis but the model I save using the "save_qmodel_path" parameter is the same size as the original model since it is still saved in bf16.

I want to reproduce the inference speedup and the memory saving you mentioned for example in table 9 of the paper.

Could you please provide help how to save the real quantized model in INT4 ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions