Skip to content

half-lang/DSQ-LoRA

Repository files navigation

DSQLora: Quantization-Aware LoRA Fine-tuning Framework with Dynamic Sequence Adjustment

🎯 Project Overview

DSQ-Lora is an efficient quantization and fine-tuning framework for large language models, integrating the following advanced techniques:

  • LoRA Fine-tuning: Parameter-efficient adaptation method for large models
  • Quantization Technology: Model compression and inference acceleration (2-8bit)
  • Dynamic Sequence Adjustment (DSQ): Gradient-based intelligent weight adjustment mechanism
  • Multi-GPU Parallel Training: Support for distributed training and inference

The framework is specifically optimized for Qwen and Llama series models, supporting various benchmarks including PPL evaluation, HumanEval, Huatuo medical QA, and more.


πŸ”§ System Architecture

Project Structure

dsqlora/
β”œβ”€β”€ dsqlora.py              # Main training script
β”œβ”€β”€ modelutils.py           # Model loading and utility functions
β”œβ”€β”€ quant.py                # Quantization-related functions and classes
β”œβ”€β”€ LoraPTQquantize.py      # LoRA quantization conversion module
β”œβ”€β”€ gptq.py                 # GPTQ quantization algorithm
β”œβ”€β”€ gradutils.py            # Gradient estimation tools
β”œβ”€β”€ datautils.py            # Data loading utilities
β”œβ”€β”€ alphaanlyse.py          # Alpha analysis tool
β”œβ”€β”€ chechpoint.py           # Checkpoint management
β”œβ”€β”€ peft/                   # PEFT library (Parameter-Efficient Fine-tuning)
β”‚   β”œβ”€β”€ tuners/             # Various tuner implementations
β”‚   β”‚   β”œβ”€β”€ lora/           # LoRA implementation
β”‚   β”‚   β”œβ”€β”€ adalora/        # AdaLora implementation
β”‚   β”‚   β”œβ”€β”€ lokr/           # LoKr implementation
β”‚   β”‚   └── ...
β”‚   └── utils/              # Utility functions
β”œβ”€β”€ benchmark/              # Evaluation benchmarks
β”‚   β”œβ”€β”€ runbenchmark.py     # Benchmark testing main script
β”‚   β”œβ”€β”€ human_eval/         # HumanEval benchmark
β”‚   β”œβ”€β”€ grade_school_math/  # Mathematical problem benchmark
β”‚   └── huatuo.py           # Medical QA benchmark
└── datasetsutils/          # Dataset utilities
    β”œβ”€β”€ code_alpaca/        # Code Alpaca dataset
    β”œβ”€β”€ disclawsft/         # Disclaimer SFT dataset
    └── orca_math_word/     # Orca math dataset

Core Module Description

dsqlora.py - Main Training Entry

  • Support for model loading (Qwen/Llama)
  • Complete quantization fine-tuning pipeline implementation
  • Command-line parameter configuration
  • SwanLab monitoring integration

quant.py - Quantization Core

  • Quantizer: Base quantizer supporting symmetric/asymmetric quantization
  • Quant3Linear/Quant8Linear: Quantized Linear layers for various precisions
  • Quantization functions and parameter search algorithms

LoraPTQquantize.py - Quantization Conversion

  • LoRA weight to quantized weight conversion
  • Support for multiple quantization configurations
  • Quantized model saving and loading

peft/ - Parameter-Efficient Fine-tuning

  • Multiple fine-tuning methods including LoRA, AdaLora, LoKr, etc.
  • Dynamic configuration and model loading
  • Integration with quantization framework

benchmark/ - Evaluation Framework

  • Unified interface for multiple benchmarks
  • Support for distributed inference evaluation
  • Result recording and comparison

πŸ“¦ Dependency Requirements

Core Dependencies

torch>=1.13.0
transformers==4.52.4
peft==0.10.0

GPU Acceleration Libraries

flash-attn==2.5.8
flash-mla==1.0.0.dev0
bitsandbytes==0.46.0
xformers==0.0.30

Training and Monitoring

pytorch-lightning==2.5.2
deepspeed==0.17.1
accelerate==1.6.0
swanlab

Evaluation Tools

rouge-score==0.1.2
sacrebleu==2.5.1
torchmetrics==1.7.3
deepeval==3.1.6

Other Utilities

numpy==1.24.1
pandas==2.3.0
tqdm
matplotlib

See requirements.txt file for complete dependencies.


πŸš€ Quick Start

1. Environment Setup

# Clone or copy the project
git clone <project-url>
cd dsqlora

# Create a Python virtual environment (recommended)
conda create -n dsqlora python=3.10
conda activate dsqlora

# Install dependencies
pip install -r requirements.txt

# Install CUDA extension (optional, but recommended for best performance)
# Requires CUDA toolkit and cuDNN
python setup.py build_ext --inplace

2. Basic Usage Example

# Basic training command
python dsqlora.py \
    /path/to/model \
    c4 \
    --model-type qwen \
    --act-order \
    --new-eval \
    --batchsize 8 \
    --benchmark ppl \
    --evaltype full \
    --learningrate 0.0005 \
    --lorar 64 \
    --wbits 8 \
    --evalbatchsize 128 \
    --epochs 3 \
    --lorainit loraqat \
    --nsamples 2048

3. Supported Datasets

Supported datasets:

  • huatuo - Medical QA dataset
  • code_alpaca - Code fine-tuning dataset
  • orca_math_word - Mathematical problem dataset
  • grade-school-math - Elementary school math dataset

πŸ’‘ Advanced Usage

1. Multi-GPU Distributed Training

# Use all available GPUs
python dsqlora.py \
    /path/to/model \
    c4 \
    --model-type qwen \
    --wbits 8 \
    --batchsize 4 \
    --usedpa  # Enable BalanceDataParallel

2. Resume Training from Checkpoint

python dsqlora.py \
    /path/to/model \
    c4 \
    --model-type qwen \
    --resume_from ./outputs/checkpoint-500

3. Evaluation-Only Mode

python dsqlora.py \
    /path/to/model \
    c4 \
    --model-type qwen \
    --wbits 8 \
    --onlyeval \
    --evalmodeltype qatmodel \
    --modelweightpath ./outputs/adapter_model.safetensors \
    --evaltype full_and_quantize

4. Custom Training Parameters

python dsqlora.py \
    /path/to/model \
    c4 \
    --model-type qwen \
    --wbits 4 \
    --lorar 128 \
    --learningrate 5e-5 \
    --epochs 5 \
    --adjust_theta 0.05 \
    --lorainit loftq

πŸ“š Key References

  • LoRA: Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models"
  • GPTQ: Frantar et al., "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers"
  • LOFTQ: Li et al., "LQ-LoRA: Low-rank Quantization-Aware Training for Large Language Models"
  • Flash Attention: Dao et al., "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

🀝 Contributing

We welcome issues, suggestions, and improvement proposals!


πŸ“„ License

Please refer to the license file in the project for detailed information.


πŸ“ž Contact

For questions or suggestions, please contact via:

  • GitHub Issues
  • Project Discussions

πŸŽ“ Related Resources


Project Status: πŸš€ Actively Maintained

About

A new quantization framwork

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published