Hi, thanks for your great work!
I've tried your unstructured pruning and it turns out really efficient.
However, performing the structured one should reduce param. But when I load it with code below, the memory usage after loading the model remained the same as the original unpruned version (using the code below and without ignore_mismatched_sizes=True it would report a bug)
Is there a way to load the pruned model so that it actually uses less memory?
transformer = SD3Transformer2DModel.from_pretrained(
args.model_path,
ignore_mismatched_sizes=True,
low_cpu_mem_usage=False
)