Refactor DPO config: move fields and remove duplicates #1392

finbarrtimbers · 2026-01-20T15:54:08Z

Summary

Move concatenated_forward and packing fields from DPOConfig to ModelConfig (these are model-related, not DPO-specific)
Remove duplicate config fields from ExperimentConfig that were already defined in parent classes:
- max_seq_length (duplicate of TrainingConfig)
- resume_from_checkpoint (duplicate of CheckpointConfig)
- checkpointing_steps (duplicate of CheckpointConfig)
- All EvalConfig fields (try_auto_save_to_beaker, gs_bucket_path, oe_eval_tasks, etc.)

Test plan

Run make style && make quality - linting passes
Run uv run pytest open_instruct/test_dpo_utils.py - tests pass
Run ./scripts/train/build_image_and_launch.sh scripts/train/debug/dpo.sh - experiment runs successfully

Note: This PR depends on #1391 (finbarr/olmo-core-dpo-base) being merged first.

🤖 Generated with Claude Code

- Add dpo.py: New DPO training module using OLMo-core's TrainModule with HSDP support - Add build_reference_logprobs_cache_olmo: Generic reference logprobs caching for OLMo-core - Add compute_loss_olmo: Wrapper for DPO loss computation with ExperimentConfig - Add concatenated_forward_olmo and separate_forward_olmo: OLMo-core forward functions - Update mason.py: Add dpo.py to OPEN_INSTRUCT_COMMANDS - Update debug scripts to use torchrun with OLMo-core models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Move concatenated_forward and packing from DPOConfig to ModelConfig - Remove duplicate fields from ExperimentConfig that were already defined in parent classes (max_seq_length, resume_from_checkpoint, checkpointing_steps, and EvalConfig fields) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 873e1a5e68

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-20T16:01:43Z

open_instruct/dpo_utils.py

    preprocessing_num_workers: int | None = field(
        default=None, metadata={"help": "The number of processes to use for the preprocessing."}
    )
-    max_seq_length: int | None = field(
-        default=None,
-        metadata={
-            "help": (
-                "The maximum total input sequence length after tokenization. "
-                "Sequences longer than this will be truncated,"
-            )
-        },
-    )
    overwrite_cache: bool = field(
        default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}


Preserve model max length when max_seq_length unset

Removing the ExperimentConfig override for max_seq_length means the default now falls back to TrainingConfig’s max_seq_length=2048 (see open_instruct/dpo_utils.py around line 134). In DPO tokenization paths we always pass max_seq_length into tokenizer.apply_chat_template(..., truncation=True, max_length=max_seq_length) (open_instruct/dataset_transformation.py:1004+), so configs that omit max_seq_length will silently truncate to 2048 instead of using the tokenizer/model’s max length. This changes training/eval behavior for 4k/8k‑context models and is easy to miss; consider restoring a None default or deriving from tokenizer.model_max_length when unset.

Useful? React with 👍 / 👎.

finbarrtimbers and others added 2 commits January 20, 2026 08:51

finbarrtimbers mentioned this pull request Jan 20, 2026

Use simple-parsing for DPO argument parsing #1393

Open

3 tasks

chatgpt-codex-connector bot reviewed Jan 20, 2026

View reviewed changes

finbarrtimbers force-pushed the finbarr/olmo-core-dpo-base branch from a43a450 to 69120dd Compare January 20, 2026 16:22

Base automatically changed from finbarr/olmo-core-dpo-base to main January 26, 2026 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor DPO config: move fields and remove duplicates #1392

Refactor DPO config: move fields and remove duplicates #1392

Uh oh!

finbarrtimbers commented Jan 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor DPO config: move fields and remove duplicates #1392

Are you sure you want to change the base?

Refactor DPO config: move fields and remove duplicates #1392

Uh oh!

Conversation

finbarrtimbers commented Jan 20, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants