Enable RLHF training for pytorch transformer #64

ealt · 2025-09-03T01:54:06Z

Enable Reinforcement Learning from Verifier Rewards (RLVR) training for PyTorch transformer models using the TRL library.

This allows training models with reward signals for arithmetic tasks, configurable via YAML files and runnable similar to existing training scripts, incorporating boxed_answer_reward and correct_answer_reward functions.

Co-authored-by: ericallenalt <ericallenalt@gmail.com>

cursor · 2025-09-03T01:54:08Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

Add RLVR training infrastructure for arithmetic tasks

4e4b89f

Co-authored-by: ericallenalt <ericallenalt@gmail.com>

ealt marked this pull request as ready for review September 3, 2025 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable RLHF training for pytorch transformer #64

Enable RLHF training for pytorch transformer #64

Uh oh!

ealt commented Sep 3, 2025

Uh oh!

cursor bot commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable RLHF training for pytorch transformer #64

Are you sure you want to change the base?

Enable RLHF training for pytorch transformer #64

Uh oh!

Conversation

ealt commented Sep 3, 2025

Uh oh!

cursor bot commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants