Seeking training recipe/advice for SDPO on Math tasks

Could you please share a training recipe for applying SDPO to math tasks?

Currently, I am training Qwen2.5-3B-Instruct on a Math training split using your SDPO implementation, but the val-core metric keeps degrading throughout the training progress. I have already tried swapping in other models and different datasets, but the training still isn't working as expected.

I would love to know if you have any empirical experience, recommended hyperparameters, or general advice for adapting your method successfully to math-heavy tasks. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking training recipe/advice for SDPO on Math tasks #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seeking training recipe/advice for SDPO on Math tasks #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions