Repository for the paper
Qua2SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models)
Keith G. Mills, Mohammad Salameh, Ruichen Chen, Negar Hassanpour, Wei Lu, Di Niu
AAAI-25
The code for this paper is split into two top-level repos:
- This repo, a fork of
q-diffusion(ICCV'23; https://github.com/Xiuyu-Li/q-diffusion), which handles actually using the diffusion models (DMs), e.g., sampling and labeling random quantization configurations and performing inference on quantized models. AutoBuild, folder, which can be found here. This is a branch of the original repository This is used for training predictors to generate subgraph scores and quantization configurations. Quantization sensitivity insight generation and analysis is handled here.
Additional directories with content:
dag_plotscontains PNG images of what the PixArt, Hunyuan, DiT, SDv1.5 and SDXL denoiser architectures look like as Directed Acyclic Graphs. The PixArt and Hunyuan images cover a subset of the transformer layers due to a rendering issue with graphviz.sample_imagescontains the generated images we use in our paper's figures.
The rest of this README concerns quantized diffusion model inference and sampling quantization configurations.
Clone this repository, and then create and activate a suitable conda environment named qua2sedimo by using the following command:
conda env create -f environment.yml
conda activate qua2sedimoInference of quantized DM denoisers requires two files:
- The multiquantizer
*_mq.ptcheckpoint file for a given denoiser network. These files describe the quantization methods and bit precisions for each weight layer in a given denoiser network. As such, these files are usually several gigabytes in size. For simplicity, download them from the Google Drive and place in a new subdirectory labeled/multiquantizers/. Alternatively, generate your own checkpoints by looking at theGenerating Multiquantizer Checkpointssection below. - The file
*.pklfile for the quantization configuration of a specific denoiser network. These configurations are found using Qua2SeDiMo predictors using theAutoBuildrepo. We provide several key quantization configurations from our paper - download them from the Google Drive and place them in a new subdirectory labeled/quant_configs/. For further information on how they are created, see theAutoBuildrepository.
# PixArt-Alpha
python scripts/pixart_alpha_infer.py --outdir alpha_sample_w4a8 --q_config quant_configs/alpha_40.pkl --quant_act --act_bit 8 --prompt 'a lion riding a bike in Paris' --n_imgs 4
# PixArt-Sigma
python scripts/pixart_sigma_infer.py --outdir sigma_sample_w39a8 --q_config quant_configs/sigma_39.pkl --quant_act --act_bit 8 --prompt 'a lion riding a bike in Paris' --n_imgs 4
# Hunyuan-DiT
python scripts/hunyuan_infer.py --outdir hunyuan_sample_w4a8 --q_config quant_configs/hunyuan_40.pkl --quant_act --act_bit 8 --prompt 'a lion riding a bike in Paris' --n_imgs 4
--outdiris the folder where results will be stored--q_configis the quantization configuration.pklfile--quant_actis the flag for enabling token-wise online activation quantization. If not set code programs to A16.--act_bitcan be set to 8 or 6. Activation quantization bit precision. Disabled if--quant_actis not set.--prompttextual prompt you want to make an image of.--n_imgsnumber of images to make for given prompt.--resimage resolution; 1024 for PixArt-Sigma and Hunyuan, while 512 for all other models.--seedrandom seed.
# SDXL
python scripts/sdxl_infer.py --outdir sdxl_sample_w37a16 --q_config quant_configs/sdxl_37.pkl --prompt 'a lion riding a bike in Paris' --n_imgs 4
# SDv1.5
python scripts/sdv15_infer.py --outdir sdv15_sample_w4a16 --q_config quant_configs/sdv15_40.pkl --prompt 'a lion riding a bike in Paris' --n_imgs 4
For SDv1.5, you will need to download the checkpoint file v1-5-pruned-emaonly.ckpt from the Google Drive and place it in /models/ldm/stable-diffusion-v1/.
python scripts/dit_infer.py --outdir dit_sample_w4a16 --q_config quant_configs/dit_40.pkl --cls_id 833 --n_imgs 4
--cls_idrefers to the ImageNet class you wish to generate images of. See this link for a list.
I get an error 'AttributeError: 'HunyuanDiTPipeline' object has no attribute '_execution_device'
To solve this, I go into the python definition of that class and replace all instances of self._execution_device with torch.device("cuda:0"). Is it hacky? Yes. Does it work. Yes.
(Note: Doesn't have to be Hunyuan)
We cast denoiser neural networks as mixed-precision search spaces where we vary the bit-precision {3, 4} and quantization method {K-Means-C, K-Means-A and UAQ} of each weight layer. We sample and evaluate random quantization configurations for each denoiser neural network in order to train the Qua2SeDiMo predictors.
To facilitate this, we implement multiquantizer checkpoints that store the quantization information (e.g., UAQ delta, K-Means centroids) for each bit precision/method for each weight layer in the neural network.
These checkpoint files can be found on the Google Drive and are several gigabytes in size. We provide instruction on how to re-create them should you so desire.
Note: Running these scripts require GPUs with large amount of VRAM (24-32GB) running for several days at a time. We use 32GB Tesla V100 GPUs.
python scripts/pixart_alpha_multiquantizer.py
python scripts/pixart_sigma_multiquantizer.py
python scripts/hunyuan_multiquantizer.py
python scripts/sdxl_multiquantizer.py --prefix {'conv_in', 'time_embedding', 'add_embedding', 'down_blocks', 'mid_block', 'up_blocks', 'conv_out'}
python scripts/sdv15_multiquantizer.py
python scripts/dit_multiquantizer.py
This will save *_mq.pt* files in the /multiquantizers/ subdirectory.
- Note: For SDXL, U-Net is too large to perform this all at once. As such, we use the
--prefixflag to separate the U-Net into different parts (see file/command for more info) and make separate multiquantizer checkpoints, e.g.,sdxl_conv_in_mq.ptfor theconv_inoperation for each section of the network. They can then be stitched together manually, e.g., ipython, into one singlesdxl_mq.ptfile. - Note: For SDv1.5, you will need to download the checkpoint file
v1-5-pruned-emaonly.ckptfrom the Google Drive and place it in/models/ldm/stable-diffusion-v1/.
Now that we have the multiquantizer checkpoints we can start sampling/evaluating them to build a dataset. For text-to-image (T2I) models this involves generating 1k images using COCO 2017 validation set prompts, then computing the FID against the COCO 2017 validation set. For DiT-XL/2, this involves 1img/class for ImageNet.
Note: Before running these scripts you need to edit constants.py to point towards your directories for the validation sets of COCO 2017 and ImageNet.
CUDA_VISIBLE_DEVICES=0 python scripts/pixart_alpha_eval_random.py --outdir alpha_random_seed42
CUDA_VISIBLE_DEVICES=0 python scripts/pixart_sigma_eval_random.py --outdir sigma_random_seed42
CUDA_VISIBLE_DEVICES=0 python scripts/hunyuan_eval_random.py --outdir hunyuan_random_seed42
CUDA_VISIBLE_DEVICES=0 python scripts/sdxl_eval_random.py --outdir sdxl_random_seed42
CUDA_VISIBLE_DEVICES=0 python scripts/sdv15_eval_random.py --outdir sdv15_random_seed42
CUDA_VISIBLE_DEVICES=0 python scripts/dit_eval_random.py --outdir dit_random_seed42
- Note: the prefix
CUDA_VISIBLE_DEVICES=0is recommended as these scripts use the clean-fid package to evaluate FID, which will (unnecessarily) try to make use multiple GPUs if it can. - Note: For SDv1.5, you will need to download the checkpoint file
v1-5-pruned-emaonly.ckptfrom the Google Drive and place it in/models/ldm/stable-diffusion-v1/.
Each script will sample and evaluate 500 (adjuatable via the --num_archs flag) random quantization configurations for their respective denoiser neural network. The results will be placed in the quant_cache.pkl file of the directory specified by the --outdir flag. These .pkl files can then be used to train Qua2SeDiMo predictors in the AutoBuild repo.
Specifically, the contents of quant_cache.pkl is a list of dictionary. Each dictonary corresponds to one quantization configuration and has the following keys:
- quantized_size - sum size of the quantized weights across the entire configuration
- avg_bits - average bit precision. Calculated using quantized_size and the variable FP_SIZE from the respective `_eval_random.py script.
- quantized_error - sum of the quantization error for all weights in the denoiser neural network.
- FID-GT-1k - FID score.
- bops - Tera Bit-Ops (BOPs) (see PTQD Sec 5.1). See files in
/bops/to understand how we estimate this metric. It is not used in our experiments. - config - the quantization configuration. Also a dictionary. Each key is the name of a quantizable weight layer. Values are lists indicating the 1) selected quantization method and bit precision 2) FP weight size 3) quantized weight size including overhead 4) quantization error for the given quantization method and bit precision.
If you find our framework useful, we kindly ask that you cite our paper:
@inproceedings{mills2025qua2sedimo,
title = {Qua$^{2}$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models},
author = {Mills, Keith G. and Salameh, Mohammad and Chen, Ruichen and Hassanpour, Negar Lu, Wei and Niu, Di},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2025},
pages={6153-6163}
