### 🚀 The feature, motivation and pitch The current Olmo is trained with fsdp but we want to reproduce the results with Megatron. ### Alternatives _No response_ ### Additional context _No response_