Replaces existing SFT with Olmo-core version. #1327

finbarrtimbers · 2026-01-07T21:59:12Z

Ran the single GPU finetuning script: Beaker.

…mmand - Add --dataset_mixer_list, --cache_dataset_only, --output_dir to train subparser - Handle --cache_dataset_only flag in main function - Update debug script to use --dataset_mixer_list pattern - Make --dataset_path optional (required only for training) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The olmo_core.launch.beaker module requires a different beaker-py version than what's installed. Make the import optional so --cache_dataset_only mode works without beaker dependencies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When --dataset_path is not provided but --output_dir is, use output_dir as the dataset path. This matches how mason.py caches datasets to output_dir and then runs training. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

olmo_core.config.Config doesn't support integer keys properly. Changed SFTConfig from inheriting Config to plain @DataClass. Removed .merge(overrides) call and debug print statements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Uses a two-step approach: 1. Cache HF dataset to numpy format on weka 2. Run training on the cached data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Needed for WandB and config saver callbacks since SFTConfig is now a plain dataclass instead of inheriting from olmo_core.config.Config. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fixes S3 403 Forbidden errors by using the weka-mounted path directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The previous checkpoint path (OLMo-2-1124-7B/step556000-unsharded) was either empty or in an incompatible format. Switch to OLMo3 checkpoint which has proper .metadata files for olmo-core's distributed checkpoint loading. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add MODEL_CONFIGS mapping for supported architectures (olmo2_7B, olmo3_7B, etc.) - Add --model_config argument to train and launch subcommands - Remove separate dry_run subcommand, use --dry_run flag instead - Update debug script to use OLMo3-7B checkpoint and model config 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use --rdzv-backend=c10d --rdzv-endpoint=localhost:0 to let torchrun automatically find a free port instead of hardcoding a port number. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add 'from_checkpoint' option for --model_config that reads the TransformerConfig directly from the checkpoint's config.json file. This ensures model architecture always matches the checkpoint. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

resource_path takes folder and filename as separate arguments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Avoid OmegaConf compatibility issues by manually constructing the TransformerConfig from the checkpoint's config.json values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The checkpoint uses per-head QK norm which has different weight shapes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use bias, eps, and name from checkpoint config for qk_norm, layer_norm, and lm_head layer_norm. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Required by OLMo3 checkpoints that were saved with torchao optimizations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The checkpoint was saved with an older torchao version that had different float8 module structure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The OLMo3 float8 checkpoint is incompatible with current library versions. Need to use a checkpoint without float8 dependencies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The previous OLMo3 checkpoint was saved with float8 optimizations that require an older version of torchao incompatible with our transformers dependency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copy BeakerCallbackV2 from refactor-dpo branch. This callback updates Beaker experiment descriptions with training progress and W&B URLs, using the beaker-py v2 API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist · 2026-01-07T21:59:45Z

Summary of Changes

Hello @finbarrtimbers, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural shift in the finetuning process by migrating the core finetuning script to the OLMo-core training infrastructure. This change aims to standardize and optimize the training of OLMo models. Consequently, all previous HuggingFace Accelerate-based Beaker configurations and finetuning scripts have been deprecated. The update also includes a new Beaker callback for compatibility and a dedicated subcommand for efficient dataset caching to a numpy format, aligning with OLMo-core's data handling mechanisms.

Highlights

Migration to OLMo-core: The primary finetuning script, open_instruct/finetune.py, has been completely rewritten to utilize the OLMo-core training infrastructure, replacing the previous HuggingFace Accelerate-based approach.
Deprecation of Old Beaker Configurations: All existing Beaker configuration files for finetuning (e.g., default_finetune.yaml, default_finetune_lora_multinode.yaml) have been marked as deprecated. They now provide a message directing users to the new OLMo-core based finetune.py launch command.
Temporary Removal of LoRA/QLoRA Support: LoRA and QLoRA training functionalities are no longer supported in the new finetune.py script. Future re-implementation is planned, as indicated by TODO comments.
New BeakerCallbackV2 Module: A new module, open_instruct/beaker_callback.py, has been introduced to provide Beaker v2 compatibility for OLMo-core, which currently uses Beaker v1. This module is intended to be temporary.
Enhanced Dataset Caching: The finetune.py script now includes a cache_dataset_only subcommand, allowing users to convert HuggingFace datasets into a numpy format optimized for OLMo-core training, complete with detailed statistics generation.
Updated Finetuning Scripts: All relevant finetuning shell scripts have been updated to reflect the new OLMo-core based finetune.py usage, with many older scripts now marked as deprecated and providing guidance to the new approach.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of the supervised finetuning (SFT) process, replacing the existing HuggingFace Accelerate-based implementation with a new one built on olmo-core. The changes include deprecating old configurations and scripts, and introducing a new, more structured finetune.py with a clear command-line interface and better configuration management using dataclasses. The addition of a custom BeakerCallbackV2 for Beaker v2 compatibility is also a great improvement.

Overall, the new implementation is much cleaner and more modular. I've identified a few areas for improvement:

A potential out-of-memory issue in the dataset caching logic for very large datasets.
A minor inconsistency in time measurement within the new Beaker callback.
A question regarding the removal of a file from the type-checking configuration.

Details are in the specific comments. Great work on this large-scale refactoring!

gemini-code-assist · 2026-01-07T22:01:40Z

open_instruct/finetune.py

+        train_dataset = train_dataset.select(range(args.num_examples))
+
+    print("Collecting tokens from dataset...")
+    token_ids = []


The current implementation of cache_dataset_only collects all token_ids and labels_mask into lists in memory before writing them to disk. For very large datasets, this can lead to out-of-memory errors, which undermines the benefit of chunking the output files.

To fix this, I suggest refactoring this part to process the dataset in chunks and write to the memory-mapped files incrementally. This would involve iterating through the dataset, accumulating a chunk of a predefined size (e.g., 1GB of tokens), writing it to a file part, and then proceeding to the next chunk. This approach would keep memory usage low regardless of the total dataset size.

gemini-code-assist · 2026-01-07T22:01:40Z

open_instruct/beaker_callback.py

+            self.enabled
+            and get_rank() == 0
+            and self.step % self.trainer.metrics_collect_interval == 0
+            and (self._last_update is None or (time.monotonic() - self._last_update) > 10)


For consistency, it's better to use the same clock for measuring time intervals. _start_time is initialized with time.perf_counter(), so _last_update should also use time.perf_counter() to avoid potential issues from mixing different clocks (time.monotonic() and time.perf_counter()).

Suggested change

and (self._last_update is None or (time.monotonic() - self._last_update) > 10)

and (self._last_update is None or (time.perf_counter() - self._last_update) > 10)

gemini-code-assist · 2026-01-07T22:01:40Z

open_instruct/beaker_callback.py

+            allow_multiple=False,
+            distributed=False,
+        )
+        self._last_update = time.monotonic()


For consistency, please use time.perf_counter() here, to match the clock used for _start_time and the check in post_step.

Suggested change

self._last_update = time.monotonic()

self._last_update = time.perf_counter()

gemini-code-assist · 2026-01-07T22:01:40Z

pyproject.toml

    "open_instruct/data_loader.py",
    "open_instruct/data_types.py",
    "open_instruct/logger_utils.py",
-    "open_instruct/mix_data.py",


The file open_instruct/mix_data.py was removed from the ty (type checking) include list. Was this intentional? It seems like it should still be type-checked to maintain code quality.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd45c4f3d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-07T22:03:54Z

open_instruct/finetune.py

+        dataset_config = build_sft_dataset(
+            root_dir=root_dir, tokenizer_config=tokenizer_config, sequence_length=seq_len, dataset_path=dataset_path
+        )
+        gpu_type = CLUSTER_TO_GPU_TYPE[cluster]


Handle clusters accepted by get_root_dir

The new config builder dereferences CLUSTER_TO_GPU_TYPE[cluster] directly, but get_root_dir still treats clusters like ai2/prior, ai2/saturn, ai2/aristo-elara-cirrascale, ai2/allennlp, ai2/mosaic, and local as valid. If a user passes any of those (which the script now allows via get_root_dir), this line raises a KeyError and the run never starts. Please either add GPU type entries for those clusters or handle a default/fallback so these previously supported clusters don’t crash during config construction.

Useful? React with 👍 / 👎.

BrownianNotion · 2026-01-08T11:25:56Z

@finbarrtimbers, the new script looks super clean! Was wondering if it would also be possible to update the scripts/configs for training OLMo-2 1B? Should docs/olmo2.md also be updated or removed?

finbarrtimbers and others added 24 commits January 5, 2026 12:02

Implemented SFT in finetune.py.

c98c961

updated script.

6d6e286

Use pre-tokenized dataset with --no_auto_dataset_cache

ab8943b

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use S3 path for pre-tokenized dataset

b79d56b

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update debug script to use cache_dataset_only workflow

14baea0

Uses a two-step approach: 1. Cache HF dataset to numpy format on weka 2. Run training on the cached data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use weka path for checkpoint instead of S3 URL

83b6a39

Fixes S3 403 Forbidden errors by using the weka-mounted path directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix resource_path API call for loading config

6ad35d8

resource_path takes folder and filename as separate arguments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add use_head_qk_norm to config loading

3709d4e

The checkpoint uses per-head QK norm which has different weight shapes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Parse all LayerNorm config values from checkpoint

b9beadb

Use bias, eps, and name from checkpoint config for qk_norm, layer_norm, and lm_head layer_norm. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add torchao dependency for checkpoint loading

76b30ba

Required by OLMo3 checkpoints that were saved with torchao optimizations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Pin torchao to older version for checkpoint compatibility

dcec417

The checkpoint was saved with an older torchao version that had different float8 module structure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Revert torchao version pin

f6c1671

The OLMo3 float8 checkpoint is incompatible with current library versions. Need to use a checkpoint without float8 dependencies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Jan 7, 2026

View reviewed changes

finbarrtimbers mentioned this pull request Jan 27, 2026

Adds a mason.py version of the SFT tokenization script. #1429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaces existing SFT with Olmo-core version. #1327

Replaces existing SFT with Olmo-core version. #1327

Uh oh!

finbarrtimbers commented Jan 7, 2026

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 7, 2026

Uh oh!

BrownianNotion commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	and (self._last_update is None or (time.monotonic() - self._last_update) > 10)
	and (self._last_update is None or (time.perf_counter() - self._last_update) > 10)

	self._last_update = time.monotonic()
	self._last_update = time.perf_counter()

Replaces existing SFT with Olmo-core version. #1327

Are you sure you want to change the base?

Replaces existing SFT with Olmo-core version. #1327

Uh oh!

Conversation

finbarrtimbers commented Jan 7, 2026

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

BrownianNotion commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants