refactor: migrate TransformerConfig validations to __post_init__ (Part of #3568) by CodersAcademy006 · Pull Request #3675 · NVIDIA/Megatron-LM

CodersAcademy006 · 2026-03-03T13:33:21Z

Summary

Partial resolution of #3568. Migrates two pure-validation assertions from validate_args() into TransformerConfig.__post_init__() for co-location, automatic enforcement at config-construction time, and unit testability.

What Changed

Added hidden_size % num_attention_heads == 0 check inside the kv_channels derivation block in TransformerConfig.__post_init__
Added num_moe_experts % expert_model_parallel_size == 0 check inside a consolidated if expert_model_parallel_size > 1 block in TransformerConfig.__post_init__ (this check was entirely absent from the config class before — only existed in validate_args)
Both asserts retained in validate_args() for legacy model paths with an explanatory comment
Added 11 unit tests in tests/unit_tests/test_transformer_config_validation.py

What Is NOT in This PR

Other config groups (follow-up PRs)
ConfigContainer cross-config validations (pending pattern discussion)
Mutation-style defaults in validate_args() — intentionally left in place

Testing

pytest tests/unit_tests/test_transformer_config_validation.py -v --noconftest
# 11 passed in 4.63s

Notes

Some entries in validate_args() mutate args conditionally rather than purely validating — those are not touched as they require a finalize() pattern discussion before migration.

Closes #3568 (partial)

Part of NVIDIA#3568. Moves two pure-validation assertions from validate_args() in arguments.py into TransformerConfig.__post_init__() for co-location, automatic enforcement at config-construction time, and unit testability. Migrated validations: - hidden_size % num_attention_heads == 0 (guarded inside kv_channels derivation) - num_moe_experts % expert_model_parallel_size == 0 (was entirely missing from TransformerConfig; only existed in validate_args) Both asserts are retained in validate_args for legacy model paths that bypass TransformerConfig construction entirely. Adds 11 unit tests: tests/unit_tests/test_transformer_config_validation.py

copy-pr-bot · 2026-03-03T13:33:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

CodersAcademy006 · 2026-03-03T13:43:16Z

@maanug-nv Hey, can you please take a look at this, thank you. Please provide me suggestions on what can be better.

yaox12 · 2026-03-05T01:29:16Z

/claude review

claude · 2026-03-05T01:32:30Z

tests/unit_tests/test_transformer_config_validation.py

+
+Part of #3568: moves pure-validation assertions from validate_args() in
+arguments.py into TransformerConfig so they fire at config-construction time
+and can be exercised without a full training process.


The comment says "moves pure-validation assertions from validate_args()", but the original assert checks in arguments.py are still present:

arguments.py:1049 — assert args.hidden_size % args.num_attention_heads == 0

arguments.py:1263–1266 — assert args.num_experts % args.expert_model_parallel_size == 0

If this PR is intentionally only the "add to __post_init__" half of the migration (with a follow-up to remove from arguments.py), the docstring here should say "adds" or "copies" rather than "moves" to avoid confusion.

CodersAcademy006 requested review from a team as code owners March 3, 2026 13:33

svcnvidia-nemo-ci requested a review from a team March 3, 2026 13:33

github-actions bot added the community-request label Mar 3, 2026

yaox12 added the Final Review PR is in the "final review" stage label Mar 4, 2026

yaox12 added 3 commits March 4, 2026 13:48

Merge branch 'main' into refactor/validation-to-dataclass-postinit

09dbc1e

Merge branch 'main' into refactor/validation-to-dataclass-postinit

e313c8a

Merge branch 'main' into refactor/validation-to-dataclass-postinit

0a6b1c7

claude bot reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: migrate TransformerConfig validations to __post_init__ (Part of #3568)#3675

refactor: migrate TransformerConfig validations to __post_init__ (Part of #3568)#3675
CodersAcademy006 wants to merge 4 commits intoNVIDIA:mainfrom
CodersAcademy006:refactor/validation-to-dataclass-postinit

CodersAcademy006 commented Mar 3, 2026

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

CodersAcademy006 commented Mar 3, 2026

Uh oh!

yaox12 commented Mar 5, 2026

Uh oh!

claude bot Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CodersAcademy006 commented Mar 3, 2026

Summary

What Changed

What Is NOT in This PR

Testing

Notes

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

CodersAcademy006 commented Mar 3, 2026

Uh oh!

yaox12 commented Mar 5, 2026

Uh oh!

claude bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants