Experiment: Reproduce "Think dot by dot" with R1 models; Elicit the inner monologue of reasoning model

([https://openreview.net/pdf?id=NikbrdtYvG](https://openreview.net/pdf?id=NikbrdtYvG)) Pfau et al. designed a simple yet experiment that demonstrates that the textual content of the CoT may not matter for the model's final output. However the additional inference pass on the CoT tokens are necessary for the model to derive the correct answer.

Pfau et al. tested this by training a toy LLaMa model on the synthetic 3SUM problems:
- Can we reproduce the observations made by Pfau et al. using a R1 model?
    -  We can use the same 3SUM dataset generated by Pfau et al. Alternatively, we can test this using MMLU dataset.
- Assuming we could: can we find a way to elicit the inner monologue/hidden reasoning from the model's intermediate hidden states? 
    - For example, given a chain-of-thought with filler tokens, can we use the next-token predicted from the middle-layer hidden states (like LogitLens) as the predicted next word and let model continue the generation always using the intermediate layer predictions? To some extend, you can imagine you are ablating the last few transformer layers of the model.
- Assuming we couldn't: this potentially means that the textual level reasoning of R1 model has a stronger connection with the model's behavior (which is ideal!).
    - Can you fool the reasoning model by injecting irrelevant information/misleading information its reasoning (e.g., how the model's final choice of a multiple choice question change if you append two sentences of CoT it generated for a different question at the end of its reasoning)? In other words, what's the total effect of the "textual reasoning" (independent var) on the model's output (dependent var)?

I assigned this experiment to @akommula, but let me know if you need any help from me!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Reproduce "Think dot by dot" with R1 models; Elicit the inner monologue of reasoning model #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experiment: Reproduce "Think dot by dot" with R1 models; Elicit the inner monologue of reasoning model #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions