Move compile config from training to top-level #672

HosseinKaviani-H · 2025-12-23T19:24:06Z

Move compile from training.compile to compile.enable in llama3_8b.yaml
Move compile from training.compile to compile.enable in qwen3_8b.yaml
Update main.py to use job_config.compile.enable instead of job_config.training.compile

This aligns the YAML config structure with ForgeJobConfig's expected schema, where compile is a separate top-level dataclass, not nested under training.

See: https://github.com/pytorch/torchtitan/blob/29aafb91b7fbffe2ee259919a3249a0eb1d70779/torchtitan/experiments/forge/job_config.py#L42 where compile is a class not under training.

- Move compile from training.compile to compile.enable in llama3_8b.yaml - Move compile from training.compile to compile.enable in qwen3_8b.yaml - Update main.py to use job_config.compile.enable instead of job_config.training.compile This aligns the YAML config structure with ForgeJobConfig's expected schema, where compile is a separate top-level dataclass, not nested under training.

felipemello1 · 2025-12-23T20:13:46Z

We have to compile reference model, training model, loss, generator. If we do compile.enable, how do we manage those? I think that the solution should be the same for sft/grpo. Please take a look and cmd+F "compile" here: https://github.com/meta-pytorch/torchforge/blob/main/apps/grpo/llama3_8b.yaml

I am inclined to think that its better for each actor to have their own compile: bool, instead of compile:str, and user can manage it at the actor level, as it is in the grpo config. SFT doesnt have this problem because its a single actor.

I do see the value in your PR though, because it brings our yaml closer to the jobconfig. No sure how to reconcile the two.

HosseinKaviani-H · 2025-12-23T20:40:32Z

We have to compile reference model, training model, loss, generator. If we do compile.enable, how do we manage those? I think that the solution should be the same for sft/grpo. Please take a look and cmd+F "compile" here: https://github.com/meta-pytorch/torchforge/blob/main/apps/grpo/llama3_8b.yaml

I am inclined to think that its better for each actor to have their own compile: bool, instead of compile:str, and user can manage it at the actor level, as it is in the grpo config. SFT doesnt have this problem because its a single actor.

I do see the value in your PR though, because it brings our yaml closer to the jobconfig. No sure how to reconcile the two.

@felipemello1 This is a different issue though. SFT and GRPO have fundamental differences as you mentioned — GRPO has multiple actors (trainer, ref_model, generator) that each need independent compile control, while SFT is a single actor.

I see two goals in this PR:

Align with TorchTitan's YAML config schema — In ForgeJobConfig, compile is a top-level dataclass (Compile), not a field under Training. Keeping training.compile causes Training.init() got unexpected keyword argument 'compile' when the config is parsed.

Enable future refactoring — For the TitanTrainer refactor, compile not being part of the Training dataclass means we'd need a monkey-patch or special handling to pop it from the config before passing to the trainer. Moving it to top-level avoids this.

For GRPO's multi-actor case, the current pattern already works well:

compile: true # top-level toggle
trainer:
compile:
enable: ${compile}
ref_model:
compile:
enable: ${compile}

This PR just makes SFT consistent with that same structure. Each recipe can still manage compile at the actor level as needed.

JenniferWang

ForgeJobConfig is a Titan specific integration detail. I'd recommend start with consolidating forge trainer interface to make backend-specific configurable parameters clearer.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 23, 2025

HosseinKaviani-H requested a review from felipemello1 December 23, 2025 19:24

JenniferWang requested changes Dec 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move compile config from training to top-level #672

Move compile config from training to top-level #672

Uh oh!

HosseinKaviani-H commented Dec 23, 2025

Uh oh!

felipemello1 commented Dec 23, 2025 •

edited

Loading

Uh oh!

HosseinKaviani-H commented Dec 23, 2025 •

edited

Loading

Uh oh!

JenniferWang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Move compile config from training to top-level #672

Are you sure you want to change the base?

Move compile config from training to top-level #672

Uh oh!

Conversation

HosseinKaviani-H commented Dec 23, 2025

Uh oh!

felipemello1 commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HosseinKaviani-H commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JenniferWang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

felipemello1 commented Dec 23, 2025 •

edited

Loading

HosseinKaviani-H commented Dec 23, 2025 •

edited

Loading