[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630
[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630
Conversation
…LI string override Enable `model.activation_func=silu` (or `gelu`, `relu`, etc.) as a Hydra CLI override by serializing known activation functions as strings during OmegaConf conversion and resolving them back to callables in TransformerConfig.finalize(). Changes: - Add ACTIVATION_FUNC_MAP and str_to_callable/callable_to_str helpers in omegaconf_utils.py - Serialize activation_func as a string instead of excluding it from OmegaConf, so Hydra overrides work - Resolve string activation_func in TransformerConfig.finalize(), MLATransformerConfig.finalize(), and HeterogeneousTransformerConfig.finalize() before MCore post-init - Add vanilla_gpt_pretrain_config recipe for MLM<->Bridge correlation testing with minimal defaults Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test e152402 |
📝 WalkthroughWalkthroughThese changes introduce infrastructure for handling string-based activation functions in configuration systems. A new vanilla GPT pretraining recipe is added, along with utilities to convert between string representations and callable activation functions for serialization and deserialization across config variants. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/YAML Config
participant OmegaConf
participant StrToCallable as str_to_callable()
participant TransformerConfig
participant Resolve as _resolve_string_fields()
participant Runtime
User->>OmegaConf: Provide activation_func: "gelu"
OmegaConf->>StrToCallable: Convert string to callable
StrToCallable->>StrToCallable: Look up in ACTIVATION_FUNC_MAP
StrToCallable-->>OmegaConf: Return F.gelu function
OmegaConf->>TransformerConfig: Pass config with callable
TransformerConfig->>Resolve: finalize() calls _resolve_string_fields()
Resolve->>Resolve: Resolve any remaining string activation_funcs
Resolve-->>TransformerConfig: String converted to callable
TransformerConfig-->>Runtime: Config ready with resolved activation_func
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/megatron/bridge/recipes/gpt/vanilla_gpt.py (1)
15-29: Recipe name is too generic for repository naming conventions.Please consider renaming
vanilla_gpt.py/vanilla_gpt_pretrain_configto include explicit size/config metadata for easier discovery (e.g.,gpt_<size>_vanilla_pretrain).As per coding guidelines, "Use descriptive recipe names that include the model size and configuration (e.g., llama3_8b.py, qwen2_7b_instruct.py)."
Also applies to: 51-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py` around lines 15 - 29, The recipe filename and identifier are too generic: rename the file and the recipe identifier (currently vanilla_gpt.py and vanilla_gpt_pretrain_config) to a descriptive name that includes model size/config (e.g., gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain), then update all internal references—docstring example, the recipe export/variable name (vanilla_gpt_pretrain_config), any entries used by run_recipe invocation, and any import/registration points that reference vanilla_gpt or vanilla_gpt_pretrain_config—so the repository consistently uses the new descriptive name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/megatron/bridge/training/utils/omegaconf_utils.py`:
- Around line 86-95: str_to_callable currently returns any attribute for dotted
paths even if it's not callable; update the dotted-path branch in
str_to_callable to verify the resolved attribute is callable before returning it
(use importlib.import_module and getattr as already done for module/attr
resolution), and if the attribute is found but not callable raise the same
ValueError used for unknown activation names (including the sorted
ACTIVATION_FUNC_MAP list for the message); ensure the function's return remains
a Callable and that non-callable attributes are rejected consistently with the
existing error path.
---
Nitpick comments:
In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py`:
- Around line 15-29: The recipe filename and identifier are too generic: rename
the file and the recipe identifier (currently vanilla_gpt.py and
vanilla_gpt_pretrain_config) to a descriptive name that includes model
size/config (e.g., gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain),
then update all internal references—docstring example, the recipe
export/variable name (vanilla_gpt_pretrain_config), any entries used by
run_recipe invocation, and any import/registration points that reference
vanilla_gpt or vanilla_gpt_pretrain_config—so the repository consistently uses
the new descriptive name.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/megatron/bridge/models/transformer_config.pysrc/megatron/bridge/recipes/gpt/__init__.pysrc/megatron/bridge/recipes/gpt/vanilla_gpt.pysrc/megatron/bridge/training/utils/omegaconf_utils.py
| parts = name.rsplit(".", 1) | ||
| if len(parts) == 2: | ||
| try: | ||
| module = importlib.import_module(parts[0]) | ||
| return getattr(module, parts[1]) | ||
| except (ImportError, AttributeError): | ||
| pass | ||
| raise ValueError( | ||
| f"Unknown activation function: '{name}'. Known names: {sorted(n for n in ACTIVATION_FUNC_MAP if '.' not in n)}" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check if the file exists and get the overall structure
git ls-files | grep omegaconf_utils.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 180
🏁 Script executed:
# Get the full str_to_callable function to understand its current implementation
rg -n -B5 -A15 'def str_to_callable' src/megatron/bridge/training/utils/omegaconf_utils.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 794
🏁 Script executed:
# Check the ACTIVATION_FUNC_MAP context to understand the data structure
rg -n -B2 -A5 'ACTIVATION_FUNC_MAP' src/megatron/bridge/training/utils/omegaconf_utils.py | head -50Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1470
🏁 Script executed:
# Get a broader view of the file structure around lines 86-95
sed -n '70,105p' src/megatron/bridge/training/utils/omegaconf_utils.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1334
🏁 Script executed:
# Check if there are tests for str_to_callable to understand expected behavior
rg -n -B3 -A10 'str_to_callable' tests/unit_tests/training/utils/test_omegaconf_utils.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Look at the full context of lines 86-95 and any callable validation in the function
sed -n '74,96p' src/megatron/bridge/training/utils/omegaconf_utils.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 874
🏁 Script executed:
# Check imports at the top of the file to ensure proper typing
sed -n '1,30p' src/megatron/bridge/training/utils/omegaconf_utils.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1151
🏁 Script executed:
# Verify the function is used elsewhere in the codebase to understand impact
rg -n 'str_to_callable' src/megatron/bridge --type py -B2 -A2Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1044
🏁 Script executed:
# Verify the proposed fix would work by checking callable behavior
python3 << 'EOF'
import importlib
# Simulate importing torch.nn.functional to validate the approach
module = importlib.import_module("torch.nn.functional")
# Test with a callable attribute (should pass)
silu_func = getattr(module, "silu")
print(f"silu is callable: {callable(silu_func)}")
# Test with a non-callable attribute to show the problem
# (torch.nn.functional has attributes like __version__)
version = getattr(module, "__version__")
print(f"__version__ is callable: {callable(version)}")
print(f"Type of __version__: {type(version)}")
EOFRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1192
🏁 Script executed:
# Verify how the function result is used to understand the impact
rg -n -B5 -A5 'activation_func = str_to_callable' src/megatron/bridgeRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1053
🏁 Script executed:
# Check if there are any other usages or tests that might reveal expected behavior
rg -r 'str_to_callable\|activation_func.*callable' src/ tests/ --type pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
Guard dotted-path resolution so only callables are accepted.
str_to_callable() currently returns any attribute from dotted paths without validation. That allows non-callables through and violates the function's return type contract (Callable) and docstring promise.
🔧 Proposed fix
def str_to_callable(name: str) -> Callable:
"""Resolve an activation function name to its callable.
Accepts short names (``"silu"``), fully qualified names
(``"torch.nn.functional.silu"``), or dotted import paths.
Raises:
ValueError: If the name cannot be resolved.
"""
if name in ACTIVATION_FUNC_MAP:
return ACTIVATION_FUNC_MAP[name]
# Fallback: try to import the dotted path
parts = name.rsplit(".", 1)
if len(parts) == 2:
try:
module = importlib.import_module(parts[0])
- return getattr(module, parts[1])
+ resolved = getattr(module, parts[1])
+ if callable(resolved):
+ return resolved
except (ImportError, AttributeError):
pass
- raise ValueError(
- f"Unknown activation function: '{name}'. Known names: {sorted(n for n in ACTIVATION_FUNC_MAP if '.' not in n)}"
- )
+ known_names = sorted(n for n in ACTIVATION_FUNC_MAP if "." not in n)
+ raise ValueError(f"Unknown activation function: '{name}'. Known names: {known_names}")🧰 Tools
🪛 Ruff (0.15.2)
[warning] 93-95: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/megatron/bridge/training/utils/omegaconf_utils.py` around lines 86 - 95,
str_to_callable currently returns any attribute for dotted paths even if it's
not callable; update the dotted-path branch in str_to_callable to verify the
resolved attribute is callable before returning it (use importlib.import_module
and getattr as already done for module/attr resolution), and if the attribute is
found but not callable raise the same ValueError used for unknown activation
names (including the sorted ACTIVATION_FUNC_MAP list for the message); ensure
the function's return remains a Callable and that non-callable attributes are
rejected consistently with the existing error path.
Extract duplicated activation-function name<->callable mappings from omegaconf_utils.py and ModelBridge.ACTIVATION_MAPPING into a single shared module (megatron.bridge.utils.activation_map). Both consumers now reference the same registry, keeping CLI overrides, config finalization, and HF<->Megatron conversion in sync. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test 4c4b987 |
|
/ok to test bec6e21 |
- Add translate_mlm_to_bridge.py for MLM→Bridge and Bridge→MLM translation - Support --reverse flag for Bridge→MLM direction - Build reverse mapping from ARG_MAP + extra tables - Add MLM argparse introspection (optional, try/except guarded) - Handle swiglu, squared_relu, mixed_precision special cases - Output as CLI overrides, standalone recipe, or full torchrun command Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
bec6e21 to
38a82f9
Compare
|
/ok to test 38a82f9 |
Summary
scripts/translate_mlm_to_bridge.py— a bidirectional translation script that converts between Megatron-LM (pretrain_gpt.py) CLI arguments and Megatron Bridge (run_recipe.py) Hydra-style overrides.--reverse) directions.torchruncommand in both directions.Motivation
Running correlation tests between Megatron-LM and Megatron Bridge requires matching configurations precisely across two very different config systems (argparse flags vs Hydra overrides on dataclass recipes). This script automates the translation, eliminating manual errors and making it easy to verify that both frameworks receive identical model/training parameters.
MLM → Bridge Translation
Translates Megatron-LM
pretrain_gpt.pyarguments (from YAML or CLI) into Bridge recipe overrides:Bridge → MLM Translation
Translates Bridge Hydra overrides into Megatron-LM CLI arguments:
Key mappings handled
num_layers,hidden_size,num_attention_heads,ffn_hidden_size,normalization,position_embedding_type,rotary_baseswiglu↔activation_func=silu+gated_linear_unit=true,squared-relu↔activation_func=squared_relumicro_batch_size,global_batch_size,train_iters,lr,weight_decay,clip_grad,seedtensor_model_parallel_size,pipeline_model_parallel_size,context_parallel_size,sequence_parallel,use_distributed_optimizerbf16↔bf16_mixed,fp16↔fp16_mixedtokenizer_type,tokenizer_model,vocab_sizeTest plan
/ok to test