Request for the hyperparameters of SFT base

Hi, thanks for your work! 

  Some parameters are already specified in the paper: AdamW optimizer, linear warm-up ratio of 0.03, batch size of 128, LoRA rank of 128, LoRA alpha of 32, and BF16 precision. For the LLaMA3.2-1B model, the learning rate is set to 8e-4 with a training duration of 10 epochs.

  However, I have a limited understanding of the implementation method. Does the SFTbase still rely on the code from this repository, retaining only the ref_ce_loss? If so, does gamma also need to be set to 10( as stated in Appendix A Implementation Details)? Or is the SFT implemented through dedicated frameworks such as LLaMA Factory or Swift? Could you please give a complete set of hyperparameters for SFT-based baselines? 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for the hyperparameters of SFT base #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Request for the hyperparameters of SFT base #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions