PEFT/readme.md at main · Lab-LVM/PEFT

Reference

The most of the codes are borrowed from PEFT docs. Also, Bitsandbytes docs describe the basic information. I just change hyper-parameters such as batch size, etc.

Tutorial

pip install -U "huggingface_hub[cli]"
huggingface-cli login # you need to generate token (just follow CMD prompts)

install required library

pip install -U bitsandbytes accelerate transformers peft trl

This is the library that I used.

accelerate-1.6.0 
datasets-3.5.0 
peft-0.15.2 
pyarrow-19.0.1 
requests-2.32.3 
tokenizers-0.21.1 
transformers-4.51.3 
trl-0.17.0

In case you use conda, please run conda env create -f environment.yml -n peft

run script. please change the cuda device and model parameter size.

for param in 7 13; do bash script/single.sh 0, $param; done
for param in 7 13; do bash script/ddp_qlora.sh 0,1 $param; done
for param in 7 13 30 65; do bash script/fsdp_qlora.sh 0,1 $param; done

summarize train latency like the below examples.

7b: 10 sec
13b: 20 sec
33b: 30 sec
65b: 60 sec

FSPD+DDP

uncomment L#171 of train.py.
run ./ddp_fsdp_qlora.sh 0,1,2,3 7

Note:

It will run two main process on GPU group 1 (0,1) and GPU group 2 (2,3).
It will train LoRA adapter with the frozen quantized model. You can see the script/ folder and easily change the backbone from quantized to FP16.
Current DDP+FSDP implementation is not perfect. The logger and saving checkpoints will be performed multiple times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference

Tutorial

FSPD+DDP

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

Reference

Tutorial

FSPD+DDP