-
Notifications
You must be signed in to change notification settings - Fork 17
Question about training loss #27
Copy link
Copy link
Open
Description
Hi,
I’m reproducing continued pre-training with LLaMA Factory, initializing from qwen3-base. The loss starts around 30 in the first steps, which is much higher than with standard AR training. When I initialize from your released checkpoints, the loss starts around 8, which also seems relatively high.
From your experience, is this expected behavior, or does it suggest a configuration issue on my side?
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels