Skip to content

Question about training loss #27

@zjr2000

Description

@zjr2000

Hi,

I’m reproducing continued pre-training with LLaMA Factory, initializing from qwen3-base. The loss starts around 30 in the first steps, which is much higher than with standard AR training. When I initialize from your released checkpoints, the loss starts around 8, which also seems relatively high.

From your experience, is this expected behavior, or does it suggest a configuration issue on my side?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions