Hi, thanks for your great work.
As I go through the technical report, I found that there's no mention about how the MTP layer in base model are updated during post training. Do you keep it frozen or have you somehow incorporated the MTP loss into the GRPO objective?