Changing hybrid_override_pattern to hybrid_layer_pattern to mirror mcore change#2628
Changing hybrid_override_pattern to hybrid_layer_pattern to mirror mcore change#2628adityavavreNVDA wants to merge 15 commits intomainfrom
Conversation
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
📝 WalkthroughWalkthroughThis PR systematically renames and propagates the configuration field Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/megatron/bridge/training/utils/flop_utils.py (1)
39-56:⚠️ Potential issue | 🟠 MajorFLOPs path should fallback to deprecated hybrid key while deprecation is active.
The updated logic only reads
hybrid_layer_pattern. For configs that still provide onlyhybrid_override_pattern, layer/MTP counts can be incorrect.Proposed fix
def calculate_layer_counts(): """Calculate the number of attention, Mamba, MLP, and MoE layers.""" - if hasattr(cfg.model, "hybrid_layer_pattern") and cfg.model.hybrid_layer_pattern: + hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None) or getattr( + cfg.model, "hybrid_override_pattern", None + ) + if hybrid_pattern: counts = {"M": 0, "*": 0, "-": 0, "E": 0} try: parse_hybrid_pattern = importlib.import_module( "megatron.core.ssm.mamba_hybrid_layer_allocation" ).parse_hybrid_pattern - parsed = parse_hybrid_pattern(cfg.model.hybrid_layer_pattern) + parsed = parse_hybrid_pattern(hybrid_pattern) if parsed.main_pattern: for layer_type in parsed.main_pattern: if layer_type in counts: counts[layer_type] += 1 if parsed.mtp_pattern and parsed.mtp_num_depths > 0: for layer_type in parsed.mtp_pattern: if layer_type in counts: counts[layer_type] += parsed.mtp_num_depths except (ImportError, ModuleNotFoundError): - for layer_type in cfg.model.hybrid_layer_pattern: + for layer_type in hybrid_pattern: if layer_type in counts: counts[layer_type] += 1 @@ - hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None) + hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None) or getattr( + cfg.model, "hybrid_override_pattern", None + )Also applies to: 434-445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/training/utils/flop_utils.py` around lines 39 - 56, The FLOPs counting block only reads cfg.model.hybrid_layer_pattern causing configs that still use the deprecated cfg.model.hybrid_override_pattern to be ignored; update the logic in the function containing the parse_hybrid_pattern import (and the similar block around the second occurrence) to fall back to cfg.model.hybrid_override_pattern when cfg.model.hybrid_layer_pattern is not set, i.e., use hybrid_layer_pattern if present else hybrid_override_pattern, then pass that value to parse_hybrid_pattern and the subsequent loops that update counts (refer to parse_hybrid_pattern, parsed.main_pattern, parsed.mtp_pattern, parsed.mtp_num_depths, and counts) so layer/MTP counts are computed correctly for both new and deprecated keys.
🧹 Nitpick comments (4)
tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py (1)
323-323: Make the key mapping resilient during deprecation cleanup.Line 323 hard-codes the deprecated source key and can become a brittle
KeyErrorif the HF toy override dict is renamed later.♻️ Optional hardening diff
MEGATRON_NEMOTRON_3_NANO_OVERRIDES = { "num_layers": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["num_hidden_layers"], - "hybrid_layer_pattern": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"], + "hybrid_layer_pattern": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get( + "hybrid_layer_pattern", + HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"], + ), "hidden_size": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hidden_size"],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py` at line 323, The test currently hard-codes HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"] when building the dict entry for "hybrid_layer_pattern", which will raise KeyError if that source key is renamed; change this to a resilient lookup such as using HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get("hybrid_override_pattern") with a sensible fallback, or use a safe dynamic lookup like next((v for k,v in HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.items() if "hybrid" in k), default_value) so the "hybrid_layer_pattern" assignment never throws and remains robust during deprecation cleanup.src/megatron/bridge/models/mamba/mamba_builder.py (1)
227-230: Add an explicit conflict rule between new and deprecated hybrid settings.Lines 227–230 forward both APIs at once. If users set
hybrid_layer_patternand deprecatedhybrid_*knobs together, precedence is implicit and easy to misconfigure.🧭 Suggested guard (example)
+ if self._model_config.hybrid_layer_pattern is not None and ( + self._model_config.hybrid_attention_ratio != 0.0 + or self._model_config.hybrid_mlp_ratio != 0.0 + or self._model_config.hybrid_override_pattern is not None + ): + raise ValueError( + "Use either hybrid_layer_pattern or deprecated hybrid_* settings, not both." + ) + return MCoreMambaModel(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/mamba/mamba_builder.py` around lines 227 - 230, Detect and block simultaneous use of the new and deprecated hybrid APIs before forwarding values: in the builder method that references _model_config (the block passing hybrid_layer_pattern, hybrid_attention_ratio, hybrid_mlp_ratio, hybrid_override_pattern), check if _model_config.hybrid_layer_pattern is set AND any of _model_config.hybrid_attention_ratio, _model_config.hybrid_mlp_ratio, or _model_config.hybrid_override_pattern are also set; if so, raise a clear ValueError (or log an error and abort) instructing the user to choose either the new hybrid_layer_pattern API or the deprecated hybrid_* knobs, so precedence is explicit and not implicit.tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py (1)
193-196: Use exact-match assertions for canonical hybrid pattern defaults.For fixed default patterns,
inis too permissive and can hide regressions. Prefer==in these two tests.Suggested test tightening
- assert ( - "M-M-M-M-M-M-M-M-M*-M-M-M-M-M-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-M-M---MM---M-M*-M-M-M-M-M-" - in provider.hybrid_layer_pattern - ) + assert ( + provider.hybrid_layer_pattern + == "M-M-M-M-M-M-M-M-M*-M-M-M-M-M-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-M-M---MM---M-M*-M-M-M-M-M-" + ) - assert ( - "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-" - in provider.hybrid_layer_pattern - ) + assert ( + provider.hybrid_layer_pattern + == "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-" + )Also applies to: 227-230
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py` around lines 193 - 196, Replace the permissive substring assertion with an exact-match assertion: change the test that currently uses "in" against provider.hybrid_layer_pattern to assert equality (==) with the full canonical pattern string for provider.hybrid_layer_pattern; apply the same change to the other similar assertion in the same test file that checks the default hybrid pattern (the one around the later occurrence) so both tests require an exact match rather than a substring.tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py (1)
161-161: Add a test case for the new config key path (hybrid_layer_pattern).This assertion covers legacy-key compatibility (
hybrid_override_pattern→ providerhybrid_layer_pattern), but there’s no explicit test for configs that already emithybrid_layer_pattern.Suggested additional unit test
+ def test_provider_bridge_mamba_config_accepts_new_hybrid_key(self, nemotronh_8b_config_dict): + """Test Mamba config mapping when config uses hybrid_layer_pattern directly.""" + cfg_dict = {k: v for k, v in nemotronh_8b_config_dict.items() if k != "hybrid_override_pattern"} + cfg_dict["hybrid_layer_pattern"] = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-" + + cfg = Mock(spec=[]) + for k, v in cfg_dict.items(): + setattr(cfg, k, v) + + mock_pretrained = Mock(spec=PreTrainedCausalLM) + mock_pretrained.config = cfg + + bridge = NemotronHBridge() + result = bridge.provider_bridge(mock_pretrained) + assert result.hybrid_layer_pattern == cfg.hybrid_layer_pattern🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py` at line 161, Add an explicit unit test that verifies the new config key path `hybrid_layer_pattern` is honored (not only the legacy `hybrid_override_pattern`): create a test in test_nemotron_h_bridge.py that instantiates the same setup used earlier (using mock_nemotronh_config but setting `hybrid_layer_pattern` instead of `hybrid_override_pattern`), run the codepath that produces `result` (the same call that yields `result.hybrid_layer_pattern`), and assert that `result.hybrid_layer_pattern == mock_nemotronh_config.hybrid_layer_pattern` to ensure direct-key behavior is covered alongside the legacy-key test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/megatron/bridge/models/nemotronh/nemotron_h_provider.py`:
- Line 63: The dataclass now uses renamed fields like hybrid_layer_pattern which
breaks callers passing legacy kwargs (e.g., hybrid_override_pattern); add
backward-compatibility by accepting the old deprecated kwargs as optional fields
(e.g., hybrid_override_pattern: Optional[str] = None) and in the class
__post_init__ map any deprecated field values onto the new fields (set
hybrid_layer_pattern from hybrid_override_pattern when present), emit a
DeprecationWarning via warnings.warn, and repeat this pattern for the other
renamed hybrid knobs (the fields referenced at the other commented locations) so
legacy callers do not get a TypeError.
In `@src/megatron/bridge/training/mlm_compat/model.py`:
- Around line 167-170: The constructor call is assuming
args.hybrid_override_pattern (and similar newly-introduced hybrid_* fields)
always exist and can raise AttributeError for older arg versions; update the
call site in model.py (where hybrid_attention_ratio, hybrid_mlp_ratio,
hybrid_override_pattern, hybrid_layer_pattern are passed) to guard each optional
argument by using safe attribute access (e.g., getattr(args,
'hybrid_override_pattern', <sane default>) and getattr(args,
'hybrid_layer_pattern', <sane default>)) so missing/deprecated fields fall back
to sensible defaults (None or the original default used elsewhere) instead of
crashing.
In `@src/megatron/bridge/training/utils/train_utils.py`:
- Line 627: The current assignment to layers assumes
config.model.hybrid_layer_pattern exists; change it to safely access the
attribute (e.g., pattern = getattr(config.model, "hybrid_layer_pattern", "") or
check hasattr(config.model, "hybrid_layer_pattern") and fall back to an empty
string or the deprecated key if present) and then compute layers =
pattern.count("E") so logging won't fail for configs still using the deprecated
key.
---
Outside diff comments:
In `@src/megatron/bridge/training/utils/flop_utils.py`:
- Around line 39-56: The FLOPs counting block only reads
cfg.model.hybrid_layer_pattern causing configs that still use the deprecated
cfg.model.hybrid_override_pattern to be ignored; update the logic in the
function containing the parse_hybrid_pattern import (and the similar block
around the second occurrence) to fall back to cfg.model.hybrid_override_pattern
when cfg.model.hybrid_layer_pattern is not set, i.e., use hybrid_layer_pattern
if present else hybrid_override_pattern, then pass that value to
parse_hybrid_pattern and the subsequent loops that update counts (refer to
parse_hybrid_pattern, parsed.main_pattern, parsed.mtp_pattern,
parsed.mtp_num_depths, and counts) so layer/MTP counts are computed correctly
for both new and deprecated keys.
---
Nitpick comments:
In `@src/megatron/bridge/models/mamba/mamba_builder.py`:
- Around line 227-230: Detect and block simultaneous use of the new and
deprecated hybrid APIs before forwarding values: in the builder method that
references _model_config (the block passing hybrid_layer_pattern,
hybrid_attention_ratio, hybrid_mlp_ratio, hybrid_override_pattern), check if
_model_config.hybrid_layer_pattern is set AND any of
_model_config.hybrid_attention_ratio, _model_config.hybrid_mlp_ratio, or
_model_config.hybrid_override_pattern are also set; if so, raise a clear
ValueError (or log an error and abort) instructing the user to choose either the
new hybrid_layer_pattern API or the deprecated hybrid_* knobs, so precedence is
explicit and not implicit.
In `@tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py`:
- Line 323: The test currently hard-codes
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"] when building
the dict entry for "hybrid_layer_pattern", which will raise KeyError if that
source key is renamed; change this to a resilient lookup such as using
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get("hybrid_override_pattern") with a
sensible fallback, or use a safe dynamic lookup like next((v for k,v in
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.items() if "hybrid" in k), default_value)
so the "hybrid_layer_pattern" assignment never throws and remains robust during
deprecation cleanup.
In `@tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py`:
- Line 161: Add an explicit unit test that verifies the new config key path
`hybrid_layer_pattern` is honored (not only the legacy
`hybrid_override_pattern`): create a test in test_nemotron_h_bridge.py that
instantiates the same setup used earlier (using mock_nemotronh_config but
setting `hybrid_layer_pattern` instead of `hybrid_override_pattern`), run the
codepath that produces `result` (the same call that yields
`result.hybrid_layer_pattern`), and assert that `result.hybrid_layer_pattern ==
mock_nemotronh_config.hybrid_layer_pattern` to ensure direct-key behavior is
covered alongside the legacy-key test.
In `@tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py`:
- Around line 193-196: Replace the permissive substring assertion with an
exact-match assertion: change the test that currently uses "in" against
provider.hybrid_layer_pattern to assert equality (==) with the full canonical
pattern string for provider.hybrid_layer_pattern; apply the same change to the
other similar assertion in the same test file that checks the default hybrid
pattern (the one around the later occurrence) so both tests require an exact
match rather than a substring.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
src/megatron/bridge/models/mamba/mamba_builder.pysrc/megatron/bridge/models/mamba/mamba_provider.pysrc/megatron/bridge/models/nemotron_vl/nemotron_vl_bridge.pysrc/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.pysrc/megatron/bridge/models/nemotronh/nemotron_h_bridge.pysrc/megatron/bridge/models/nemotronh/nemotron_h_provider.pysrc/megatron/bridge/training/mlm_compat/model.pysrc/megatron/bridge/training/utils/flop_utils.pysrc/megatron/bridge/training/utils/train_utils.pytests/functional_tests/models/nemotron_vl/test_nemotron_vl_conversion.pytests/functional_tests/models/nemotronh/test_nemotron_h_conversion.pytests/functional_tests/recipes/test_nemotronh_recipes_finetune.pytests/functional_tests/recipes/test_nemotronh_recipes_pretrain.pytests/unit_tests/models/mamba/test_mamba_builder.pytests/unit_tests/models/mamba/test_mamba_provider.pytests/unit_tests/models/nemotronh/test_nemotron_h_bridge.pytests/unit_tests/models/nemotronh/test_nemotron_h_provider.pytests/unit_tests/training/mlm_compat/test_model.pytests/unit_tests/training/utils/test_flop_utils.py
| """Configuration for a 4B parameter Nemotron-H model.""" | ||
|
|
||
| hybrid_override_pattern: str = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-" | ||
| hybrid_layer_pattern: str = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-" |
There was a problem hiding this comment.
Deprecation is currently a hard break for provider constructor kwargs.
These renames remove acceptance of legacy kwargs at dataclass init time. Any caller still passing hybrid_override_pattern (or the other deprecated hybrid knobs mentioned in the PR) will now fail with TypeError instead of getting a deprecation warning.
Proposed compatibility bridge for deprecated config fields
`@dataclass`
class NemotronHModelProvider(MambaModelProvider):
"""Configuration for Nemotron-H models."""
+ hybrid_layer_pattern: str | None = None
+ hybrid_override_pattern: str | None = None
+ hybrid_attention_ratio: float | None = None
+ hybrid_mlp_ratio: float | None = None
+
+ def __post_init__(self) -> None:
+ if self.hybrid_override_pattern is not None:
+ warnings.warn(
+ "hybrid_override_pattern is deprecated; use hybrid_layer_pattern.",
+ DeprecationWarning,
+ stacklevel=2,
+ )
+ if self.hybrid_layer_pattern is not None and self.hybrid_layer_pattern != self.hybrid_override_pattern:
+ raise ValueError(
+ "Both hybrid_layer_pattern and hybrid_override_pattern were provided with different values."
+ )
+ self.hybrid_layer_pattern = self.hybrid_override_pattern
+
+ if self.hybrid_attention_ratio is not None or self.hybrid_mlp_ratio is not None:
+ warnings.warn(
+ "hybrid_attention_ratio and hybrid_mlp_ratio are deprecated; use hybrid_layer_pattern.",
+ DeprecationWarning,
+ stacklevel=2,
+ )
+
+ super().__post_init__()Also applies to: 78-78, 91-91, 106-106, 124-124, 140-140, 158-158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/megatron/bridge/models/nemotronh/nemotron_h_provider.py` at line 63, The
dataclass now uses renamed fields like hybrid_layer_pattern which breaks callers
passing legacy kwargs (e.g., hybrid_override_pattern); add
backward-compatibility by accepting the old deprecated kwargs as optional fields
(e.g., hybrid_override_pattern: Optional[str] = None) and in the class
__post_init__ map any deprecated field values onto the new fields (set
hybrid_layer_pattern from hybrid_override_pattern when present), emit a
DeprecationWarning via warnings.warn, and repeat this pattern for the other
renamed hybrid knobs (the fields referenced at the other commented locations) so
legacy callers do not get a TypeError.
| hybrid_attention_ratio=args.hybrid_attention_ratio, | ||
| hybrid_mlp_ratio=args.hybrid_mlp_ratio, | ||
| hybrid_override_pattern=args.hybrid_override_pattern, | ||
| hybrid_layer_pattern=args.hybrid_layer_pattern, |
There was a problem hiding this comment.
Guard deprecated/new hybrid args with safe defaults to prevent AttributeError.
Line 169 assumes args.hybrid_override_pattern always exists. With the rename rollout, this can crash when older/deprecated args are absent.
🐛 Proposed fix
model = MambaModel(
config=config,
mamba_stack_spec=mamba_stack_spec,
vocab_size=args.padded_vocab_size,
max_sequence_length=args.max_position_embeddings,
pre_process=pre_process,
- hybrid_attention_ratio=args.hybrid_attention_ratio,
- hybrid_mlp_ratio=args.hybrid_mlp_ratio,
- hybrid_override_pattern=args.hybrid_override_pattern,
- hybrid_layer_pattern=args.hybrid_layer_pattern,
+ hybrid_attention_ratio=getattr(args, "hybrid_attention_ratio", 0.0),
+ hybrid_mlp_ratio=getattr(args, "hybrid_mlp_ratio", 0.0),
+ hybrid_override_pattern=getattr(args, "hybrid_override_pattern", None),
+ hybrid_layer_pattern=getattr(args, "hybrid_layer_pattern", None),
post_process=post_process,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hybrid_attention_ratio=args.hybrid_attention_ratio, | |
| hybrid_mlp_ratio=args.hybrid_mlp_ratio, | |
| hybrid_override_pattern=args.hybrid_override_pattern, | |
| hybrid_layer_pattern=args.hybrid_layer_pattern, | |
| hybrid_attention_ratio=getattr(args, "hybrid_attention_ratio", 0.0), | |
| hybrid_mlp_ratio=getattr(args, "hybrid_mlp_ratio", 0.0), | |
| hybrid_override_pattern=getattr(args, "hybrid_override_pattern", None), | |
| hybrid_layer_pattern=getattr(args, "hybrid_layer_pattern", None), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/megatron/bridge/training/mlm_compat/model.py` around lines 167 - 170, The
constructor call is assuming args.hybrid_override_pattern (and similar
newly-introduced hybrid_* fields) always exist and can raise AttributeError for
older arg versions; update the call site in model.py (where
hybrid_attention_ratio, hybrid_mlp_ratio, hybrid_override_pattern,
hybrid_layer_pattern are passed) to guard each optional argument by using safe
attribute access (e.g., getattr(args, 'hybrid_override_pattern', <sane default>)
and getattr(args, 'hybrid_layer_pattern', <sane default>)) so missing/deprecated
fields fall back to sensible defaults (None or the original default used
elsewhere) instead of crashing.
|
|
||
| if config.model.is_hybrid_model: | ||
| layers = config.model.hybrid_override_pattern.count("E") | ||
| layers = config.model.hybrid_layer_pattern.count("E") |
There was a problem hiding this comment.
Guard hybrid pattern access during deprecation window.
Line 627 assumes hybrid_layer_pattern is always populated. Hybrid configs that still only set the deprecated key can fail in logging.
Proposed fix
- if config.model.is_hybrid_model:
- layers = config.model.hybrid_layer_pattern.count("E")
+ if config.model.is_hybrid_model:
+ hybrid_pattern = getattr(config.model, "hybrid_layer_pattern", None) or getattr(
+ config.model, "hybrid_override_pattern", None
+ )
+ layers = hybrid_pattern.count("E") if hybrid_pattern else config.model.num_layers
else:
layers = config.model.num_layers📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| layers = config.model.hybrid_layer_pattern.count("E") | |
| if config.model.is_hybrid_model: | |
| hybrid_pattern = getattr(config.model, "hybrid_layer_pattern", None) or getattr( | |
| config.model, "hybrid_override_pattern", None | |
| ) | |
| layers = hybrid_pattern.count("E") if hybrid_pattern else config.model.num_layers | |
| else: | |
| layers = config.model.num_layers |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/megatron/bridge/training/utils/train_utils.py` at line 627, The current
assignment to layers assumes config.model.hybrid_layer_pattern exists; change it
to safely access the attribute (e.g., pattern = getattr(config.model,
"hybrid_layer_pattern", "") or check hasattr(config.model,
"hybrid_layer_pattern") and fall back to an empty string or the deprecated key
if present) and then compute layers = pattern.count("E") so logging won't fail
for configs still using the deprecated key.
|
@yaoyu-33 We also need to remove num_layers and calculate it from pattern to mirror MLM https://github.com/duncanriach/Megatron-LM/blob/aca6ca3970f2eddc1774ac706df8701e8b15ddcd/megatron/training/arguments.py#L601-L616 Where do you think this goes? |
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
|
/ok to test fede9aa |
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
|
/ok to test 0ce7b9f |
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…6-03-04-main Made-with: Cursor # Conflicts: # uv.lock
|
/ok to test 4dcf187 |
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
|
/ok to test 8377697 |
|
/ok to test eebd1f8 |
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
|
/ok to test 1055d0d |
|
/ok to test c19f025 |
What does this PR do ?
This PR mirrors MLM PR#3377 that deprecates hybrid_override_pattern, hybrid_attention_ratio, and hybrid_mlp_ratio in favour of hybrid_layer_pattern.
Summary by CodeRabbit
New Features
hybrid_layer_patternconfiguration option for improved hybrid layer processing specification across all model providers.Deprecation
hybrid_attention_ratio,hybrid_mlp_ratio, andhybrid_override_patternconfiguration parameters. These will be removed in a future release. Users should migrate to the newhybrid_layer_patternoption.