Inquiry about releasing Dream RL tuning scripts and guidance

First, thank you for all your incredible work

I have a question regarding reinforcement learning tuning with Dream. I tried to directly replace LLaDA with Dream in the RL training script, but the outputs appear as unreadable characters. For example:

```
Question:
Bob was creating a math test for an online platform. He created 13 questions in the first hour. Bob then doubled his rate for the second hour, and doubled his second hour rate for the third hour. How many questions did Bob create in the three hours?
Let's think step by step, and then state the final answer clearly.
Answer:

--------------------
Ground Truth:
91
--------------------
0-th Response:
ونججدجججججججججججججججج دججججج اججججججججججججججججججججججججججالجج
--------------------
Extracted:
ونججدججججججججججججججججججججججججججججالجج

❌ [0.0, 0.0]
```

Would it be possible for you to publicly share the RL tuning scripts and any guidance for Dream? It would be really helpful for reproducing results and avoiding issues like garbled outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry about releasing Dream RL tuning scripts and guidance #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inquiry about releasing Dream RL tuning scripts and guidance #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions