First, thank you for all your incredible work
I have a question regarding reinforcement learning tuning with Dream. I tried to directly replace LLaDA with Dream in the RL training script, but the outputs appear as unreadable characters. For example:
Question:
Bob was creating a math test for an online platform. He created 13 questions in the first hour. Bob then doubled his rate for the second hour, and doubled his second hour rate for the third hour. How many questions did Bob create in the three hours?
Let's think step by step, and then state the final answer clearly.
Answer:
--------------------
Ground Truth:
91
--------------------
0-th Response:
ونججدجججججججججججججججج دججججج اججججججججججججججججججججججججججالجج
--------------------
Extracted:
ونججدججججججججججججججججججججججججججججالجج
❌ [0.0, 0.0]
Would it be possible for you to publicly share the RL tuning scripts and any guidance for Dream? It would be really helpful for reproducing results and avoiding issues like garbled outputs.
First, thank you for all your incredible work
I have a question regarding reinforcement learning tuning with Dream. I tried to directly replace LLaDA with Dream in the RL training script, but the outputs appear as unreadable characters. For example:
Would it be possible for you to publicly share the RL tuning scripts and any guidance for Dream? It would be really helpful for reproducing results and avoiding issues like garbled outputs.