Experiment: Diffing Reasoning Models on Math

## Setup

For the first experiment, I am focusing specifically on math. 
I am using [DeepScaleR-1.5B](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview) as a distilled model (Qwern 1.5B as base) and [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k?row=0) as a dataset.

The base model for R1 distill is [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B)

DeepScale is based on Qwern1.5B distillation of R1 with additional RL to make it better at math. OpenR1-Math is a set of reasoning traces sourced from full DeepSeek R1 ran on NuminaMath 1.5 dataset.

I want to test if we will see math-reasoning-specific features for the distill-and-RL model.

Here the dataset pre-processed and with a chat template applied https://huggingface.co/datasets/mitroitskii/OpenR1-Math-220k-formatted

I am using @jkminder implementation of the crosscoder - https://github.com/jkminder/dictionary_learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Diffing Reasoning Models on Math #15

Setup

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experiment: Diffing Reasoning Models on Math #15

Description

Setup

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions