REQUEST: Attempt to replicate Turpin et al. for a reasoning model, and interpret what's going on #17
Replies: 2 comments 7 replies
-
|
Hii, I'm Marmik. I'm an undergrad at Penn State University doing a CS + math major. I did a small experiment on prefiling the reasoning tokens for r1-distill-qwen-7b for mathematical reasoning tasks where i prefilled the reasoning traces with confounding tokens (tokens that had no correlation with the input question/ task) and figured that the model was still able to arrive at the final answer. So this is very interesting to me as the natural next step to try and continue the work. For the past few months I've been working on analyzing the experts in a MoE for domain specialization, and we’ve found redundancy in topk experts using logit lens [the work was recently submitted to an ICLR’25 systems4ml workshop]. In the past I’ve also worked on model architecture for small and accurate models for downstream tasks like image-to-latex with <165 params for with a bleu score of 0.80. Next Steps :
Future directions :
|
Beta Was this translation helpful? Give feedback.
-
|
Hi, I would love to collaborate on this!
I think that this approach could isolate whether DeepSeek's relative robustness stems from pre-training data patterns or architectural choices. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Research Questions:
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting (Turpin et al., 2023) introduced a simple, clever experimental paradigm that elicits "unfaithful" chains of thought. The idea is to introduce a "bias" (such as always choosing the first answer in a multiple-choice question) into the system with a prompt; that bias influences the systems answers, even though chain-of-thought reasoning doesn't mention it at all!
Question 1: Does Turpin et al. "prompt biasing" replicate for DeepSeek?
If one applies the experimental set-up described in the paper to DeepSeek, do we get the same results?
Question 2: Why or why not?
Once we know the answer, there are many natural next steps! If it turns out DeepSeek is less susceptible to "secret biases", or mentions these bias directly, that would be itself an interesting result. On the other hand, if it is susceptible to biases, that may give us a simple case where we can see a disconnect between thinking tokens and actual reasoning. We can then apply a variety of standard techniques to try to understand the origin of this disconnect.
Owner:
Martin Wattenberg is happy to help advise on this, and made the original request for a project! However, he's happy to let someone else own this, and doesn't have time to do these experiments himself.
Contributors:
You?
Project status:
Not Started Yet
Beta Was this translation helpful? Give feedback.
All reactions