Skip to content

Request for the hyperparameters of SFT base #14

@scarydemon2

Description

@scarydemon2

Hi, thanks for your work!

Some parameters are already specified in the paper: AdamW optimizer, linear warm-up ratio of 0.03, batch size of 128, LoRA rank of 128, LoRA alpha of 32, and BF16 precision. For the LLaMA3.2-1B model, the learning rate is set to 8e-4 with a training duration of 10 epochs.

However, I have a limited understanding of the implementation method. Does the SFTbase still rely on the code from this repository, retaining only the ref_ce_loss? If so, does gamma also need to be set to 10( as stated in Appendix A Implementation Details)? Or is the SFT implemented through dedicated frameworks such as LLaMA Factory or Swift? Could you please give a complete set of hyperparameters for SFT-based baselines?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions