Skip to content

refactor: migrate TransformerConfig validations to __post_init__ (Part of #3568)#3675

Open
CodersAcademy006 wants to merge 4 commits intoNVIDIA:mainfrom
CodersAcademy006:refactor/validation-to-dataclass-postinit
Open

refactor: migrate TransformerConfig validations to __post_init__ (Part of #3568)#3675
CodersAcademy006 wants to merge 4 commits intoNVIDIA:mainfrom
CodersAcademy006:refactor/validation-to-dataclass-postinit

Conversation

@CodersAcademy006
Copy link
Contributor

Summary

Partial resolution of #3568. Migrates two pure-validation assertions from validate_args() into TransformerConfig.__post_init__() for co-location, automatic enforcement at config-construction time, and unit testability.

What Changed

  • Added hidden_size % num_attention_heads == 0 check inside the kv_channels derivation block in TransformerConfig.__post_init__
  • Added num_moe_experts % expert_model_parallel_size == 0 check inside a consolidated if expert_model_parallel_size > 1 block in TransformerConfig.__post_init__ (this check was entirely absent from the config class before — only existed in validate_args)
  • Both asserts retained in validate_args() for legacy model paths with an explanatory comment
  • Added 11 unit tests in tests/unit_tests/test_transformer_config_validation.py

What Is NOT in This PR

  • Other config groups (follow-up PRs)
  • ConfigContainer cross-config validations (pending pattern discussion)
  • Mutation-style defaults in validate_args() — intentionally left in place

Testing

pytest tests/unit_tests/test_transformer_config_validation.py -v --noconftest
# 11 passed in 4.63s

Notes

Some entries in validate_args() mutate args conditionally rather than purely validating — those are not touched as they require a finalize() pattern discussion before migration.

Closes #3568 (partial)

Part of NVIDIA#3568. Moves two pure-validation assertions from validate_args()
in arguments.py into TransformerConfig.__post_init__() for co-location,
automatic enforcement at config-construction time, and unit testability.

Migrated validations:
- hidden_size % num_attention_heads == 0 (guarded inside kv_channels derivation)
- num_moe_experts % expert_model_parallel_size == 0 (was entirely missing
  from TransformerConfig; only existed in validate_args)

Both asserts are retained in validate_args for legacy model paths that
bypass TransformerConfig construction entirely.

Adds 11 unit tests: tests/unit_tests/test_transformer_config_validation.py
@CodersAcademy006 CodersAcademy006 requested review from a team as code owners March 3, 2026 13:33
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team March 3, 2026 13:33
@CodersAcademy006
Copy link
Contributor Author

@maanug-nv Hey, can you please take a look at this, thank you. Please provide me suggestions on what can be better.

@yaox12 yaox12 added the Final Review PR is in the "final review" stage label Mar 4, 2026
@yaox12
Copy link
Member

yaox12 commented Mar 5, 2026

/claude review


Part of #3568: moves pure-validation assertions from validate_args() in
arguments.py into TransformerConfig so they fire at config-construction time
and can be exercised without a full training process.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "moves pure-validation assertions from validate_args()", but the original assert checks in arguments.py are still present:

  • arguments.py:1049assert args.hidden_size % args.num_attention_heads == 0
  • arguments.py:1263–1266assert args.num_experts % args.expert_model_parallel_size == 0

If this PR is intentionally only the "add to __post_init__" half of the migration (with a follow-up to remove from arguments.py), the docstring here should say "adds" or "copies" rather than "moves" to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request Final Review PR is in the "final review" stage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move argument validation into dataclass post-init

2 participants