Skip to content

Inquiry about releasing Dream RL tuning scripts and guidance #4

@YanJiangJerry

Description

@YanJiangJerry

First, thank you for all your incredible work

I have a question regarding reinforcement learning tuning with Dream. I tried to directly replace LLaDA with Dream in the RL training script, but the outputs appear as unreadable characters. For example:

Question:
Bob was creating a math test for an online platform. He created 13 questions in the first hour. Bob then doubled his rate for the second hour, and doubled his second hour rate for the third hour. How many questions did Bob create in the three hours?
Let's think step by step, and then state the final answer clearly.
Answer:

--------------------
Ground Truth:
91
--------------------
0-th Response:
ونججدجججججججججججججججج دججججج اججججججججججججججججججججججججججالجج
--------------------
Extracted:
ونججدججججججججججججججججججججججججججججالجج

❌ [0.0, 0.0]

Would it be possible for you to publicly share the RL tuning scripts and any guidance for Dream? It would be really helpful for reproducing results and avoiding issues like garbled outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions