-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
Impressive and thorough work! Especially for sharing code and datasets.
However, have you considered running ablations on models like llama3 and qwen2.5/qwen3 (instead of distilled versions)?
And you might have noticed that there's a blog arguing that many RL works overclaimed their effectiveness because of the Qwens' specificity (https://safe-lip-9a8.notion.site/Incorrect-Baseline-Evaluations-Call-into-Question-Recent-LLM-RL-Claims-2012f1fbf0ee8094ab8ded1953c15a37).
The distilled versions may be more complicated, as they were trained on a private dataset from DeepSeek.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels