-
Notifications
You must be signed in to change notification settings - Fork 491
smolzero #1330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @mnoukhov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on expanding and refining the training infrastructure for reinforcement learning from human feedback (RLHF) models, specifically for mathematical reasoning tasks. It updates core Ray placement strategies, streamlines debug configurations, and significantly overhauls the OLMo 7B training script. Crucially, it introduces a comprehensive set of new training and debugging scripts for various Qwen models, enabling broader experimentation and development within the RLZero Math domain. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates several training and debug scripts. The changes involve modifying hyperparameters, updating model paths, and adding new scripts for different model sizes and configurations. My review focuses on improving the correctness and maintainability of these scripts. I've identified a potentially incorrect command-line flag that could cause script failures, a recurring typo in comments that might lead to confusion, and an instance of code duplication that could impact future maintenance.
| export VLLM_DISABLE_COMPILE_CACHE=1 | ||
| export VLLM_USE_V1=1 | ||
| uv run python open_instruct/grpo_fast.py \ | ||
| uv run --active python open_instruct/grpo_fast.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --active flag for uv run appears to be incorrect. According to the uv documentation, uv run does not have an --active flag. This will likely cause the script to fail. Please remove it or replace it with the correct flag if you intended something else.
| uv run --active python open_instruct/grpo_fast.py \ | |
| uv run python open_instruct/grpo_fast.py \ |
| LOCAL_EVALS="allenai/Dolci-RLZero-Math-7B 32" | ||
| LOCAL_EVAL_SPLITS="train" | ||
|
|
||
| uv run --active open_instruct/grpo_fast.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --active flag for uv run appears to be incorrect. According to the uv documentation, uv run does not have an --active flag. This will likely cause the script to fail. Please remove it or replace it with the correct flag if you intended something else.
| uv run --active open_instruct/grpo_fast.py \ | |
| uv run open_instruct/grpo_fast.py \ |
| shift | ||
| BEAKER_IMAGE="nathanl/open_instruct_auto" | ||
|
|
||
| # Check if the first argument starts with the value of $BEAKER_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment refers to $BEAKER_NAME, but this variable is not defined in the script. The condition below it uses $BEAKER_USER. The comment should be updated to refer to $BEAKER_USER for consistency and to avoid confusion.
| # Check if the first argument starts with the value of $BEAKER_NAME | |
| # Check if the first argument starts with the value of $BEAKER_USER |
| BEAKER_USER=$(beaker account whoami --format json | jq -r '.[0].name') | ||
| BEAKER_IMAGE="nathanl/open_instruct_auto" | ||
|
|
||
| # Check if the first argument starts with the value of $BEAKER_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment refers to $BEAKER_NAME, but this variable is not defined in the script. The condition below it uses $BEAKER_USER. The comment should be updated to refer to $BEAKER_USER for consistency and to avoid confusion. This same issue is present in other new scripts in this PR.
| # Check if the first argument starts with the value of $BEAKER_NAME | |
| # Check if the first argument starts with the value of $BEAKER_USER |
| #!/bin/bash | ||
|
|
||
| EXP_NAME="qwen1.5distill_rlzero_math" | ||
| MODEL_NAME_OR_PATH="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" | ||
| DATASETS="allenai/Dolci-RLZero-Math-7B 1.0" | ||
|
|
||
| LOCAL_EVALS="allenai/Dolci-RLZero-Math-7B 32" | ||
| LOCAL_EVAL_SPLITS="train train" | ||
|
|
||
| EVALS="aime:zs_cot_r1::pass_at_32_2024_rlzero,aime:zs_cot_r1::pass_at_32_2025_rlzero" | ||
|
|
||
| BEAKER_USER=$(beaker account whoami --format json | jq -r '.[0].name') | ||
| BEAKER_IMAGE="nathanl/open_instruct_auto" | ||
|
|
||
| # Check if the first argument starts with the value of $BEAKER_NAME | ||
| if [[ "$1" == "$BEAKER_USER"* ]]; then | ||
| BEAKER_IMAGE="$1" | ||
| shift | ||
| fi | ||
|
|
||
| cluster=ai2/augusta | ||
| uv run mason.py \ | ||
| --task_name ${EXP_NAME} \ | ||
| --cluster ${cluster} \ | ||
| --workspace ai2/olmo-instruct \ | ||
| --priority normal \ | ||
| --pure_docker_mode \ | ||
| --image ${BEAKER_IMAGE} \ | ||
| --preemptible \ | ||
| --num_nodes 1 \ | ||
| --env VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \ | ||
| --env VLLM_ATTENTION_BACKEND="FLASH_ATTN" \ | ||
| --gpus 4 \ | ||
| --budget ai2/oe-adapt \ | ||
| -- source configs/beaker_configs/ray_node_setup.sh \ | ||
| \&\& uv run open_instruct/grpo_fast.py \ | ||
| --exp_name ${EXP_NAME} \ | ||
| --beta 0.0 \ | ||
| --async_steps 8 \ | ||
| --inflight_updates \ | ||
| --no_resampling_pass_rate 0.875 \ | ||
| --truncated_importance_sampling_ratio_cap 2.0 \ | ||
| --advantage_normalization_type centered \ | ||
| --active_sampling \ | ||
| --num_samples_per_prompt_rollout 8 \ | ||
| --num_unique_prompts_rollout 32 \ | ||
| --num_mini_batches 1 \ | ||
| --learning_rate 1e-6 \ | ||
| --per_device_train_batch_size 1 \ | ||
| --kl_estimator 2 \ | ||
| --dataset_mixer_list $DATASETS \ | ||
| --dataset_mixer_list_splits train \ | ||
| --dataset_mixer_eval_list $LOCAL_EVALS \ | ||
| --dataset_mixer_eval_list_splits $LOCAL_EVAL_SPLITS \ | ||
| --max_prompt_token_length 2048 \ | ||
| --response_length 8192 \ | ||
| --pack_length 18432 \ | ||
| --model_name_or_path ${MODEL_NAME_OR_PATH} \ | ||
| --chat_template_name olmo_thinker_rlzero \ | ||
| --non_stop_penalty False \ | ||
| --temperature 1.0 \ | ||
| --total_episodes 256000 \ | ||
| --deepspeed_stage 2 \ | ||
| --num_learners_per_node 1 \ | ||
| --vllm_num_engines 3 \ | ||
| --vllm_tensor_parallel_size 1 \ | ||
| --lr_scheduler_type constant \ | ||
| --apply_verifiable_reward true \ | ||
| --seed 1 \ | ||
| --local_eval_every 25 \ | ||
| --save_freq 100 \ | ||
| --checkpoint_state_freq 100 \ | ||
| --gradient_checkpointing \ | ||
| --with_tracking \ | ||
| --vllm_enable_prefix_caching \ | ||
| --clip_higher 0.272 \ | ||
| --mask_truncated_completions False \ | ||
| --oe_eval_max_length 32768 \ | ||
| --try_launch_beaker_eval_jobs_on_weka True \ | ||
| --eval_priority normal \ | ||
| --eval_on_step_0 True \ | ||
| --oe_eval_tasks $EVALS \ | ||
| --load_ref_policy False \ | ||
| --oe_eval_gpu_multiplier 2 $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script appears to be an exact copy of scripts/train/qwen/1.5b_rlzero_math.sh. Having identical files creates a maintenance burden, as changes will need to be applied in both places. Consider consolidating them into a single script that can be parameterized for debug vs. full runs, or using a shared configuration file.
| BEAKER_USER=$(beaker account whoami --format json | jq -r '.[0].name') | ||
| BEAKER_IMAGE="nathanl/open_instruct_auto" | ||
|
|
||
| # Check if the first argument starts with the value of $BEAKER_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment refers to $BEAKER_NAME, but this variable is not defined in the script. The condition below it uses $BEAKER_USER. The comment should be updated to refer to $BEAKER_USER for consistency and to avoid confusion.
| # Check if the first argument starts with the value of $BEAKER_NAME | |
| # Check if the first argument starts with the value of $BEAKER_USER |
| BEAKER_USER=$(beaker account whoami --format json | jq -r '.[0].name') | ||
| BEAKER_IMAGE="nathanl/open_instruct_auto" | ||
|
|
||
| # Check if the first argument starts with the value of $BEAKER_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment refers to $BEAKER_NAME, but this variable is not defined in the script. The condition below it uses $BEAKER_USER. The comment should be updated to refer to $BEAKER_USER for consistency and to avoid confusion.
| # Check if the first argument starts with the value of $BEAKER_NAME | |
| # Check if the first argument starts with the value of $BEAKER_USER |
| BEAKER_USER=$(beaker account whoami --format json | jq -r '.[0].name') | ||
| BEAKER_IMAGE="nathanl/open_instruct_auto" | ||
|
|
||
| # Check if the first argument starts with the value of $BEAKER_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment refers to $BEAKER_NAME, but this variable is not defined in the script. The condition below it uses $BEAKER_USER. The comment should be updated to refer to $BEAKER_USER for consistency and to avoid confusion.
| # Check if the first argument starts with the value of $BEAKER_NAME | |
| # Check if the first argument starts with the value of $BEAKER_USER |
No description provided.