Skip to content

Latest commit

 

History

History
62 lines (45 loc) · 1.62 KB

File metadata and controls

62 lines (45 loc) · 1.62 KB

Reference

The most of the codes are borrowed from PEFT docs. Also, Bitsandbytes docs describe the basic information. I just change hyper-parameters such as batch size, etc.

Tutorial

  1. login to huggingface cli
pip install -U "huggingface_hub[cli]"
huggingface-cli login # you need to generate token (just follow CMD prompts)
  1. install required library
pip install -U bitsandbytes accelerate transformers peft trl 

This is the library that I used.

accelerate-1.6.0 
datasets-3.5.0 
peft-0.15.2 
pyarrow-19.0.1 
requests-2.32.3 
tokenizers-0.21.1 
transformers-4.51.3 
trl-0.17.0

In case you use conda, please run conda env create -f environment.yml -n peft

  1. run script. please change the cuda device and model parameter size.
for param in 7 13; do bash script/single.sh 0, $param; done
for param in 7 13; do bash script/ddp_qlora.sh 0,1 $param; done
for param in 7 13 30 65; do bash script/fsdp_qlora.sh 0,1 $param; done
  1. summarize train latency like the below examples.
  • 7b: 10 sec
  • 13b: 20 sec
  • 33b: 30 sec
  • 65b: 60 sec

FSPD+DDP

  1. uncomment L#171 of train.py.
  2. run ./ddp_fsdp_qlora.sh 0,1,2,3 7

Note:

  • It will run two main process on GPU group 1 (0,1) and GPU group 2 (2,3).
  • It will train LoRA adapter with the frozen quantized model. You can see the script/ folder and easily change the backbone from quantized to FP16.
  • Current DDP+FSDP implementation is not perfect. The logger and saving checkpoints will be performed multiple times.