QERA: an Analytical Framework for Quantization Error Reconstruction

This is the official implementation of the ICLR'25 paper "QERA: an Analytical Framework for Quantization Error Reconstruction".

Env Setup

git clone git@github.com:ChengZhang-98/QERA.git
cd QERA
git submodule update --init
conda env create -f environment.yml
conda activate qera
pip install -r requirements.txt
pip install -e .

Entry Points

In the source code and scripts, we use the following abbreviations for the low-rank term types:

If --disable-qera is set, no low-rank terms are used, i.e., weight-only quantization.
Else:
- identity: Truncated SVD on the quantized weight matrix, i.e., ZeroQuant-V2
- lqer: The heuristic method proposed in LQER paper
- diag: QERA-approx in our paper.
- exact: QERA-exact in our paper.

Post-Training Quantization

ptq_bf16_baseline.py evaluates BF16 baseline.
ptq_q_baseline.py evaluates PTQ baseline.
ptq_pipeline.py runs data calibration (if needed), computes low-rank terms, and evaluates the quantized model.
ptq_pipeline_chunked.py runs data calibration (if needed), and computes low-rank terms for a chunk of layers. This is useful for large models. If all chunks (layers) are computed, this script also triggers the evaluation of the quantized model.
- chunk_checker.py checks the completion of the chunks (optional).

Quantized LoRA Fine-Tuning

adapt_and_save.py run data calibration, quantizes the model, computes the initial value of the low-rank terms, and saves the quantized model + low-rank terms.
glue_train.py fine-tunes the qLoRA-adapted model with low-rank terms on GLUE tasks.
clm_train.py fine-tunes the qLoRA-adapted model with low-rank terms on WikiText2.
gsm8k_train.py fine-tunes the qLoRA-adapted model with low-rank terms on GSM8K.

Experiment Scripts

See experiments/ptq and experiments/qpeft for PTQ and qLoRA fine-tuning experiments, respectively.

Citation

@article{zhang2024qera,
  title={QERA: an Analytical Framework for Quantization Error Reconstruction},
  author={Zhang, Cheng and Wong, Jeffrey TH and Xiao, Can and Constantinides, George A and Zhao, Yiren},
  journal={arXiv preprint arXiv:2410.06040},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QERA: an Analytical Framework for Quantization Error Reconstruction

Env Setup

Entry Points

Post-Training Quantization

Quantized LoRA Fine-Tuning

Experiment Scripts

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
docs		docs
experiments		experiments
src		src
submodules		submodules
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
adapt_and_save.py		adapt_and_save.py
chunk_checker.py		chunk_checker.py
clm_train.py		clm_train.py
environment.yml		environment.yml
glue_train.py		glue_train.py
gsm8k_train.py		gsm8k_train.py
ptq_bf16_baseline.py		ptq_bf16_baseline.py
ptq_pipeline.py		ptq_pipeline.py
ptq_pipeline_chunked.py		ptq_pipeline_chunked.py
ptq_q_baseline.py		ptq_q_baseline.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

QERA: an Analytical Framework for Quantization Error Reconstruction

Env Setup

Entry Points

Post-Training Quantization

Quantized LoRA Fine-Tuning

Experiment Scripts

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages