-
Notifications
You must be signed in to change notification settings - Fork 491
Fix eval responses lost on timeout in grpo_fast #1439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @hamishivi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust mechanism to handle asynchronous evaluation results in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the issue of lost evaluation responses on timeout by introducing an EvalAccumulator to persist partial results across training steps. The changes are well-structured, with new helper functions for pulling and finalizing evaluation results, and the main training loop is updated to manage the accumulator's lifecycle. Unit tests for the new EvalAccumulator class are also included, ensuring its correctness. My main concerns are a bug in the backward compatibility path of maybe_evaluate, which could lead to silent evaluation failures, and a potential crash when processing an empty set of evaluation results. I've left specific comments with suggested fixes for these issues.
bb8b5a3 to
1252dc1
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes an issue where partial evaluation responses could be lost due to timeouts. The new approach of accumulating results in a list across evaluation steps is a solid improvement.
My review includes one high-severity comment on open_instruct/grpo_fast.py that identifies a bug in the new logic for the final evaluation step, where an incomplete evaluation would be skipped. I've also pointed out a related performance issue with the timeout handling and suggested a more robust implementation to address both points.
6e24a8d to
9aa9ebf
Compare
9aa9ebf to
d4342d6
Compare
3f08939 to
4d546fe
Compare
4d546fe to
172ff73
Compare
Fixes #1385. Now, we accumulate eval results across training steps. If we hit a new eval step but haven't yet cleared out partial results, we log a warning and just flush it out without logging.
The only thing I'm unsure about: the training step associated with the logging is the training step we actually log at. This is technically wrong since the eval samples can be from whenever, but I think its better than pausing all training for the eval samples. The main use case is tracking performance, so I think it should still work (we want to observe that more training steps -> eval scores increasing).