QZO: Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

A novel memory-efficient training method that reduces the total VRAM cost by more than 18x.

This is the official code implementation of the paper 'Fine-tuning Quantized Neural Networks with Zeroth-order Optimization'.

📰News

[2025/06/17]:🔥We have released our code.
[2025/05/20]:🔥We have released our paper on arXiv.

✈️Introduction

Fine-tuning large language models (LLMs) unlocks their potential for various downstream tasks. However, as the parameter size of LLMs continues to grow exponentially, a significant bottleneck in GPU memory becomes a major issue. In this work, we propose a novel memory-efficient training method, Quantized Zeroth-order Optimization (QZO), which minimizes the VRAM cost of model weights, gradients, and optimizer states within a unified work. Notably, QZO achieves a total VRAM reduction of 18x compared with regular fine-tuning during the memory profiling (see the figure below).

🗒️Installation

Clone the repository and enter its root folder

git clone https://github.com/maifoundations/QZO.git
cd QZO

Create a CONDA environment and install the required packages
```
conda create -n qzo --python==3.12.0
conda activate qzo
pip install -r requirements.txt
```
Note that gptqmodel==1.7.2 with Triton inference kernel is required to reproduce the results with GPTQ. Otherwise, unexpected behaviour will be observed, such as divergence when using the Marlin kernel (see this issue for more details).
To start training, you may refer to the example scripts (scripts/examples.sh) located in both the large_language_models and stable_diffusion folders.

You may also need to comment out part of the sanity check codes in transformers.trainer.py to support the direct fine-tuning of a quantized language model. For example, if you are using transformers==4.48.0, you need to comment out the following code starting from line 553 in the trainer scripts, and append a pass in the end:
```
if _is_quantized_and_base_model and not _is_peft_model(model) and not _is_model_quantized_and_qat_trainable:
    # raise ValueError(
    #     "You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of"
    #     " the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft"
    #     " for more details"
    # )
    pass
```

🎓Citation

@article{shang2025fine,
  title={Fine-tuning Quantized Neural Networks with Zeroth-order Optimization},
  author={Shang, Sifeng and Zhou, Jiayi and Lin, Chenyu and Li, Minxian and Zhou, Kaiyang},
  journal={arXiv preprint arXiv:2505.13430},
  year={2025}
}

Acknowledgment

Our code is built upon the following projects: MeZO, AQLM.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
large_language_models		large_language_models
stable_diffusion		stable_diffusion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QZO: Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

📰News

✈️Introduction

🗒️Installation

🎓Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

maifoundations/QZO

Folders and files

Latest commit

History

Repository files navigation

QZO: Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

📰News

✈️Introduction

🗒️Installation

🎓Citation

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages