Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

Summary

  • Move concatenated_forward and packing fields from DPOConfig to ModelConfig (these are model-related, not DPO-specific)
  • Remove duplicate config fields from ExperimentConfig that were already defined in parent classes:
    • max_seq_length (duplicate of TrainingConfig)
    • resume_from_checkpoint (duplicate of CheckpointConfig)
    • checkpointing_steps (duplicate of CheckpointConfig)
    • All EvalConfig fields (try_auto_save_to_beaker, gs_bucket_path, oe_eval_tasks, etc.)

Test plan

  • Run make style && make quality - linting passes
  • Run uv run pytest open_instruct/test_dpo_utils.py - tests pass
  • Run ./scripts/train/build_image_and_launch.sh scripts/train/debug/dpo.sh - experiment runs successfully

Note: This PR depends on #1391 (finbarr/olmo-core-dpo-base) being merged first.

🤖 Generated with Claude Code

finbarrtimbers and others added 2 commits January 20, 2026 08:51
- Add dpo.py: New DPO training module using OLMo-core's TrainModule with HSDP support
- Add build_reference_logprobs_cache_olmo: Generic reference logprobs caching for OLMo-core
- Add compute_loss_olmo: Wrapper for DPO loss computation with ExperimentConfig
- Add concatenated_forward_olmo and separate_forward_olmo: OLMo-core forward functions
- Update mason.py: Add dpo.py to OPEN_INSTRUCT_COMMANDS
- Update debug scripts to use torchrun with OLMo-core models

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move concatenated_forward and packing from DPOConfig to ModelConfig
- Remove duplicate fields from ExperimentConfig that were already
  defined in parent classes (max_seq_length, resume_from_checkpoint,
  checkpointing_steps, and EvalConfig fields)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 873e1a5e68

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 334 to 338
preprocessing_num_workers: int | None = field(
default=None, metadata={"help": "The number of processes to use for the preprocessing."}
)
max_seq_length: int | None = field(
default=None,
metadata={
"help": (
"The maximum total input sequence length after tokenization. "
"Sequences longer than this will be truncated,"
)
},
)
overwrite_cache: bool = field(
default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve model max length when max_seq_length unset

Removing the ExperimentConfig override for max_seq_length means the default now falls back to TrainingConfig’s max_seq_length=2048 (see open_instruct/dpo_utils.py around line 134). In DPO tokenization paths we always pass max_seq_length into tokenizer.apply_chat_template(..., truncation=True, max_length=max_seq_length) (open_instruct/dataset_transformation.py:1004+), so configs that omit max_seq_length will silently truncate to 2048 instead of using the tokenizer/model’s max length. This changes training/eval behavior for 4k/8k‑context models and is easy to miss; consider restoring a None default or deriving from tokenizer.model_max_length when unset.

Useful? React with 👍 / 👎.

@finbarrtimbers finbarrtimbers force-pushed the finbarr/olmo-core-dpo-base branch from a43a450 to 69120dd Compare January 20, 2026 16:22
Base automatically changed from finbarr/olmo-core-dpo-base to main January 26, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants