Skip to content

Conversation

@ealt
Copy link
Collaborator

@ealt ealt commented Sep 3, 2025

Enable Reinforcement Learning from Verifier Rewards (RLVR) training for PyTorch transformer models using the TRL library.

This allows training models with reward signals for arithmetic tasks, configurable via YAML files and runnable similar to existing training scripts, incorporating boxed_answer_reward and correct_answer_reward functions.


Open in Cursor Open in Web

Co-authored-by: ericallenalt <ericallenalt@gmail.com>
@cursor
Copy link

cursor bot commented Sep 3, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@ealt ealt marked this pull request as ready for review September 3, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants