-
Notifications
You must be signed in to change notification settings - Fork 491
Refactor DPO config: move fields and remove duplicates #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add dpo.py: New DPO training module using OLMo-core's TrainModule with HSDP support - Add build_reference_logprobs_cache_olmo: Generic reference logprobs caching for OLMo-core - Add compute_loss_olmo: Wrapper for DPO loss computation with ExperimentConfig - Add concatenated_forward_olmo and separate_forward_olmo: OLMo-core forward functions - Update mason.py: Add dpo.py to OPEN_INSTRUCT_COMMANDS - Update debug scripts to use torchrun with OLMo-core models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move concatenated_forward and packing from DPOConfig to ModelConfig - Remove duplicate fields from ExperimentConfig that were already defined in parent classes (max_seq_length, resume_from_checkpoint, checkpointing_steps, and EvalConfig fields) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 873e1a5e68
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| preprocessing_num_workers: int | None = field( | ||
| default=None, metadata={"help": "The number of processes to use for the preprocessing."} | ||
| ) | ||
| max_seq_length: int | None = field( | ||
| default=None, | ||
| metadata={ | ||
| "help": ( | ||
| "The maximum total input sequence length after tokenization. " | ||
| "Sequences longer than this will be truncated," | ||
| ) | ||
| }, | ||
| ) | ||
| overwrite_cache: bool = field( | ||
| default=False, metadata={"help": "Overwrite the cached training and evaluation sets"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve model max length when max_seq_length unset
Removing the ExperimentConfig override for max_seq_length means the default now falls back to TrainingConfig’s max_seq_length=2048 (see open_instruct/dpo_utils.py around line 134). In DPO tokenization paths we always pass max_seq_length into tokenizer.apply_chat_template(..., truncation=True, max_length=max_seq_length) (open_instruct/dataset_transformation.py:1004+), so configs that omit max_seq_length will silently truncate to 2048 instead of using the tokenizer/model’s max length. This changes training/eval behavior for 4k/8k‑context models and is easy to miss; consider restoring a None default or deriving from tokenizer.model_max_length when unset.
Useful? React with 👍 / 👎.
a43a450 to
69120dd
Compare
Summary
concatenated_forwardandpackingfields from DPOConfig to ModelConfig (these are model-related, not DPO-specific)max_seq_length(duplicate of TrainingConfig)resume_from_checkpoint(duplicate of CheckpointConfig)checkpointing_steps(duplicate of CheckpointConfig)try_auto_save_to_beaker,gs_bucket_path,oe_eval_tasks, etc.)Test plan
make style && make quality- linting passesuv run pytest open_instruct/test_dpo_utils.py- tests pass./scripts/train/build_image_and_launch.sh scripts/train/debug/dpo.sh- experiment runs successfullyNote: This PR depends on #1391 (finbarr/olmo-core-dpo-base) being merged first.
🤖 Generated with Claude Code