Hi,
Following up on #4, I am having trouble reproducing the results reported in Table 2 of the paper (for Llama-1B). I was able to achieve accuracy comparable to the reported SFT-CoT results after training the model for 25 epochs, so this one is fine.
Could you please confirm the values used for the following arguments for both the Coconut baseline and Coconut with SIM-CoT?
max_latent_stage
num_epochs
Thanks!
Neeraj