(https://openreview.net/pdf?id=NikbrdtYvG) Pfau et al. designed a simple yet experiment that demonstrates that the textual content of the CoT may not matter for the model's final output. However the additional inference pass on the CoT tokens are necessary for the model to derive the correct answer.
Pfau et al. tested this by training a toy LLaMa model on the synthetic 3SUM problems:
- Can we reproduce the observations made by Pfau et al. using a R1 model?
- We can use the same 3SUM dataset generated by Pfau et al. Alternatively, we can test this using MMLU dataset.
- Assuming we could: can we find a way to elicit the inner monologue/hidden reasoning from the model's intermediate hidden states?
- For example, given a chain-of-thought with filler tokens, can we use the next-token predicted from the middle-layer hidden states (like LogitLens) as the predicted next word and let model continue the generation always using the intermediate layer predictions? To some extend, you can imagine you are ablating the last few transformer layers of the model.
- Assuming we couldn't: this potentially means that the textual level reasoning of R1 model has a stronger connection with the model's behavior (which is ideal!).
- Can you fool the reasoning model by injecting irrelevant information/misleading information its reasoning (e.g., how the model's final choice of a multiple choice question change if you append two sentences of CoT it generated for a different question at the end of its reasoning)? In other words, what's the total effect of the "textual reasoning" (independent var) on the model's output (dependent var)?
I assigned this experiment to @akommula, but let me know if you need any help from me!
(https://openreview.net/pdf?id=NikbrdtYvG) Pfau et al. designed a simple yet experiment that demonstrates that the textual content of the CoT may not matter for the model's final output. However the additional inference pass on the CoT tokens are necessary for the model to derive the correct answer.
Pfau et al. tested this by training a toy LLaMa model on the synthetic 3SUM problems:
I assigned this experiment to @akommula, but let me know if you need any help from me!