Skip to content

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

License

Notifications You must be signed in to change notification settings

maifoundations/QZO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QZO: Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

A novel memory-efficient training method that reduces the total VRAM cost by more than 18x.

🤗 Hugging Face   |    📑 Paper    |    📖 Blog   

This is the official code implementation of the paper 'Fine-tuning Quantized Neural Networks with Zeroth-order Optimization'.

📰News

  • [2025/06/17]:🔥We have released our code.
  • [2025/05/20]:🔥We have released our paper on arXiv.

✈️Introduction

Fine-tuning large language models (LLMs) unlocks their potential for various downstream tasks. However, as the parameter size of LLMs continues to grow exponentially, a significant bottleneck in GPU memory becomes a major issue. In this work, we propose a novel memory-efficient training method, Quantized Zeroth-order Optimization (QZO), which minimizes the VRAM cost of model weights, gradients, and optimizer states within a unified work. Notably, QZO achieves a total VRAM reduction of 18x compared with regular fine-tuning during the memory profiling (see the figure below).

🗒️Installation

  1. Clone the repository and enter its root folder

    git clone https://github.com/maifoundations/QZO.git
    cd QZO
  2. Create a CONDA environment and install the required packages

    conda create -n qzo --python==3.12.0
    conda activate qzo
    pip install -r requirements.txt

    Note that gptqmodel==1.7.2 with Triton inference kernel is required to reproduce the results with GPTQ. Otherwise, unexpected behaviour will be observed, such as divergence when using the Marlin kernel (see this issue for more details).

  3. To start training, you may refer to the example scripts (scripts/examples.sh) located in both the large_language_models and stable_diffusion folders.

    You may also need to comment out part of the sanity check codes in transformers.trainer.py to support the direct fine-tuning of a quantized language model. For example, if you are using transformers==4.48.0, you need to comment out the following code starting from line 553 in the trainer scripts, and append a pass in the end:

    if _is_quantized_and_base_model and not _is_peft_model(model) and not _is_model_quantized_and_qat_trainable:
        # raise ValueError(
        #     "You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of"
        #     " the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft"
        #     " for more details"
        # )
        pass

🎓Citation

@article{shang2025fine,
  title={Fine-tuning Quantized Neural Networks with Zeroth-order Optimization},
  author={Shang, Sifeng and Zhou, Jiayi and Lin, Chenyu and Li, Minxian and Zhou, Kaiyang},
  journal={arXiv preprint arXiv:2505.13430},
  year={2025}
}

Acknowledgment

Our code is built upon the following projects: MeZO, AQLM.

About

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published