Experiment: Attributing the final output of reasoning model to its intermediate thinking

When an R1 model produces an extensive chain-of-thought, how important is each segment of its reasoning for the final answer? Is it only the thought after the Aha moment matters, or do the earlier, possibly flawed, steps also crucial? Can we trace the impact of each segment on the model’s final output?"

**Start from the easiest to the hardest way of answering this question:**
1. If we ablate a word/sentence/reasoning step from the model's generated CoT, how much does the logit of the final choice change?
2. Can we use some existing gradient-based saliency method to visualize the contribution of each reasoning token's importance to the final output?
3. Alternatively, can we optimize a sparse attention mask that we could apply on the computation of the final choice token?
    -  We want to maintain the original answer output by the model while increasing the sparsity of the attention (so only a few tokens in the reasoning are being attended when computing the hidden states of the answer tokens). 
    - Certainly, the information from the masked out tokens may flow into the hidden states of the unmasked tokens in the intermediate layers. Even so, it's still interesting to see which words are necessary for model to get what they got.

Of course, the multiple choice questions are the easiest to work with since you only need to track the contribution of different thinking segments to one token, namely the final choice made by the model.

I assigned this experiment to @aghyad-deeb, but let me know if you need any help from me!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Attributing the final output of reasoning model to its intermediate thinking #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experiment: Attributing the final output of reasoning model to its intermediate thinking #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions