DSQ-Lora is an efficient quantization and fine-tuning framework for large language models, integrating the following advanced techniques:
- LoRA Fine-tuning: Parameter-efficient adaptation method for large models
- Quantization Technology: Model compression and inference acceleration (2-8bit)
- Dynamic Sequence Adjustment (DSQ): Gradient-based intelligent weight adjustment mechanism
- Multi-GPU Parallel Training: Support for distributed training and inference
The framework is specifically optimized for Qwen and Llama series models, supporting various benchmarks including PPL evaluation, HumanEval, Huatuo medical QA, and more.
dsqlora/
βββ dsqlora.py # Main training script
βββ modelutils.py # Model loading and utility functions
βββ quant.py # Quantization-related functions and classes
βββ LoraPTQquantize.py # LoRA quantization conversion module
βββ gptq.py # GPTQ quantization algorithm
βββ gradutils.py # Gradient estimation tools
βββ datautils.py # Data loading utilities
βββ alphaanlyse.py # Alpha analysis tool
βββ chechpoint.py # Checkpoint management
βββ peft/ # PEFT library (Parameter-Efficient Fine-tuning)
β βββ tuners/ # Various tuner implementations
β β βββ lora/ # LoRA implementation
β β βββ adalora/ # AdaLora implementation
β β βββ lokr/ # LoKr implementation
β β βββ ...
β βββ utils/ # Utility functions
βββ benchmark/ # Evaluation benchmarks
β βββ runbenchmark.py # Benchmark testing main script
β βββ human_eval/ # HumanEval benchmark
β βββ grade_school_math/ # Mathematical problem benchmark
β βββ huatuo.py # Medical QA benchmark
βββ datasetsutils/ # Dataset utilities
βββ code_alpaca/ # Code Alpaca dataset
βββ disclawsft/ # Disclaimer SFT dataset
βββ orca_math_word/ # Orca math dataset
- Support for model loading (Qwen/Llama)
- Complete quantization fine-tuning pipeline implementation
- Command-line parameter configuration
- SwanLab monitoring integration
Quantizer: Base quantizer supporting symmetric/asymmetric quantizationQuant3Linear/Quant8Linear: Quantized Linear layers for various precisions- Quantization functions and parameter search algorithms
- LoRA weight to quantized weight conversion
- Support for multiple quantization configurations
- Quantized model saving and loading
- Multiple fine-tuning methods including LoRA, AdaLora, LoKr, etc.
- Dynamic configuration and model loading
- Integration with quantization framework
- Unified interface for multiple benchmarks
- Support for distributed inference evaluation
- Result recording and comparison
torch>=1.13.0
transformers==4.52.4
peft==0.10.0
flash-attn==2.5.8
flash-mla==1.0.0.dev0
bitsandbytes==0.46.0
xformers==0.0.30
pytorch-lightning==2.5.2
deepspeed==0.17.1
accelerate==1.6.0
swanlab
rouge-score==0.1.2
sacrebleu==2.5.1
torchmetrics==1.7.3
deepeval==3.1.6
numpy==1.24.1
pandas==2.3.0
tqdm
matplotlib
See requirements.txt file for complete dependencies.
# Clone or copy the project
git clone <project-url>
cd dsqlora
# Create a Python virtual environment (recommended)
conda create -n dsqlora python=3.10
conda activate dsqlora
# Install dependencies
pip install -r requirements.txt
# Install CUDA extension (optional, but recommended for best performance)
# Requires CUDA toolkit and cuDNN
python setup.py build_ext --inplace# Basic training command
python dsqlora.py \
/path/to/model \
c4 \
--model-type qwen \
--act-order \
--new-eval \
--batchsize 8 \
--benchmark ppl \
--evaltype full \
--learningrate 0.0005 \
--lorar 64 \
--wbits 8 \
--evalbatchsize 128 \
--epochs 3 \
--lorainit loraqat \
--nsamples 2048Supported datasets:
huatuo- Medical QA datasetcode_alpaca- Code fine-tuning datasetorca_math_word- Mathematical problem datasetgrade-school-math- Elementary school math dataset
# Use all available GPUs
python dsqlora.py \
/path/to/model \
c4 \
--model-type qwen \
--wbits 8 \
--batchsize 4 \
--usedpa # Enable BalanceDataParallelpython dsqlora.py \
/path/to/model \
c4 \
--model-type qwen \
--resume_from ./outputs/checkpoint-500python dsqlora.py \
/path/to/model \
c4 \
--model-type qwen \
--wbits 8 \
--onlyeval \
--evalmodeltype qatmodel \
--modelweightpath ./outputs/adapter_model.safetensors \
--evaltype full_and_quantizepython dsqlora.py \
/path/to/model \
c4 \
--model-type qwen \
--wbits 4 \
--lorar 128 \
--learningrate 5e-5 \
--epochs 5 \
--adjust_theta 0.05 \
--lorainit loftq- LoRA: Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models"
- GPTQ: Frantar et al., "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers"
- LOFTQ: Li et al., "LQ-LoRA: Low-rank Quantization-Aware Training for Large Language Models"
- Flash Attention: Dao et al., "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
We welcome issues, suggestions, and improvement proposals!
Please refer to the license file in the project for detailed information.
For questions or suggestions, please contact via:
- GitHub Issues
- Project Discussions
- PEFT Official Documentation
- Transformers Official Documentation
- PyTorch Official Documentation
- SwanLab Monitoring Platform
Project Status: π Actively Maintained