[misc] feat: Add bidirectional MLM <-> MBridge config translation script by yaoyu-33 · Pull Request #2630 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-03-03T22:06:09Z

Summary

Add scripts/translate_mlm_to_bridge.py — a bidirectional translation script that converts between Megatron-LM (pretrain_gpt.py) CLI arguments and Megatron Bridge (run_recipe.py) Hydra-style overrides.
Supports MLM→Bridge (default) and Bridge→MLM (--reverse) directions.
Outputs as CLI overrides, standalone recipe file, or full torchrun command in both directions.

Motivation

Running correlation tests between Megatron-LM and Megatron Bridge requires matching configurations precisely across two very different config systems (argparse flags vs Hydra overrides on dataclass recipes). This script automates the translation, eliminating manual errors and making it easy to verify that both frameworks receive identical model/training parameters.

MLM → Bridge Translation

Translates Megatron-LM pretrain_gpt.py arguments (from YAML or CLI) into Bridge recipe overrides:

# From a YAML config file
python scripts/translate_mlm_to_bridge.py config.yaml

# From inline args
python scripts/translate_mlm_to_bridge.py -- --num-layers 32 --hidden-size 4096 --bf16

# Emit a full torchrun command
python scripts/translate_mlm_to_bridge.py config.yaml --format command --nproc 8

Bridge → MLM Translation

Translates Bridge Hydra overrides into Megatron-LM CLI arguments:

# From CLI override string
python scripts/translate_mlm_to_bridge.py --reverse \
  "model.num_layers=32 model.hidden_size=4096 mixed_precision=bf16_mixed"

# From a Bridge YAML/override file
python scripts/translate_mlm_to_bridge.py --reverse bridge_config.yaml

# Emit a full torchrun command for pretrain_gpt.py
python scripts/translate_mlm_to_bridge.py --reverse \
  --format command --nproc 8 \
  "model.num_layers=32 model.hidden_size=4096"

Key mappings handled

Category	Examples
Model architecture	`num_layers`, `hidden_size`, `num_attention_heads`, `ffn_hidden_size`, `normalization`, `position_embedding_type`, `rotary_base`
Activation	`swiglu` ↔ `activation_func=silu` + `gated_linear_unit=true`, `squared-relu` ↔ `activation_func=squared_relu`
Training	`micro_batch_size`, `global_batch_size`, `train_iters`, `lr`, `weight_decay`, `clip_grad`, `seed`
Parallelism	`tensor_model_parallel_size`, `pipeline_model_parallel_size`, `context_parallel_size`, `sequence_parallel`, `use_distributed_optimizer`
Precision	`bf16` ↔ `bf16_mixed`, `fp16` ↔ `fp16_mixed`
Tokenizer	`tokenizer_type`, `tokenizer_model`, `vocab_size`

Test plan

Existing unit tests pass
CI triggered with /ok to test
Verified round-trip translation preserves key parameters
Verified loss correlation between MLM and Bridge with matched parameters using this script

…LI string override Enable `model.activation_func=silu` (or `gelu`, `relu`, etc.) as a Hydra CLI override by serializing known activation functions as strings during OmegaConf conversion and resolving them back to callables in TransformerConfig.finalize(). Changes: - Add ACTIVATION_FUNC_MAP and str_to_callable/callable_to_str helpers in omegaconf_utils.py - Serialize activation_func as a string instead of excluding it from OmegaConf, so Hydra overrides work - Resolve string activation_func in TransformerConfig.finalize(), MLATransformerConfig.finalize(), and HeterogeneousTransformerConfig.finalize() before MCore post-init - Add vanilla_gpt_pretrain_config recipe for MLM<->Bridge correlation testing with minimal defaults Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

copy-pr-bot · 2026-03-03T22:06:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-03-03T22:06:17Z

/ok to test e152402

coderabbitai · 2026-03-03T22:12:42Z

📝 Walkthrough

Walkthrough

These changes introduce infrastructure for handling string-based activation functions in configuration systems. A new vanilla GPT pretraining recipe is added, along with utilities to convert between string representations and callable activation functions for serialization and deserialization across config variants.

Changes

Cohort / File(s)	Summary
Activation Function Handling `src/megatron/bridge/training/utils/omegaconf_utils.py`, `src/megatron/bridge/models/transformer_config.py`	Introduces bidirectional mapping between string activation function names and callables via ACTIVATION_FUNC_MAP, callable_to_str, and str_to_callable. Applies string-to-callable resolution in finalize() workflows to ensure runtime callables are consistently instantiated.
Vanilla GPT Recipe `src/megatron/bridge/recipes/gpt/vanilla_gpt.py`, `src/megatron/bridge/recipes/gpt/__init__.py`	New vanilla GPT pretraining recipe module providing vanilla_gpt_pretrain_config() that builds Megatron-LM compatible configuration with defaults for model, training, validation, optimizer, dataset, logging, checkpointing, and distributed settings.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/YAML Config
    participant OmegaConf
    participant StrToCallable as str_to_callable()
    participant TransformerConfig
    participant Resolve as _resolve_string_fields()
    participant Runtime

    User->>OmegaConf: Provide activation_func: "gelu"
    OmegaConf->>StrToCallable: Convert string to callable
    StrToCallable->>StrToCallable: Look up in ACTIVATION_FUNC_MAP
    StrToCallable-->>OmegaConf: Return F.gelu function
    OmegaConf->>TransformerConfig: Pass config with callable
    TransformerConfig->>Resolve: finalize() calls _resolve_string_fields()
    Resolve->>Resolve: Resolve any remaining string activation_funcs
    Resolve-->>TransformerConfig: String converted to callable
    TransformerConfig-->>Runtime: Config ready with resolved activation_func

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR contains major changes to activation function serialization and configuration finalization, but test plan lists critical tests as remaining incomplete: existing recipes verification and unit tests for round-trip conversions.	Complete remaining tests (verify Llama/Qwen/DeepSeek recipes, add unit tests for str_to_callable/callable_to_str) and document all results including edge cases before merging.
Title check	⚠️ Warning	The PR title '[misc] feat: Add bidirectional MLM <-> MBridge config translation script' does not match the actual changes, which focus on enabling activation_func CLI string overrides and adding a vanilla GPT recipe.	Update the PR title to reflect the primary change: something like '[training, model, recipe] feat: Allow activation_func to be set via CLI string override' would be more accurate.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yuya/feat-activation-func-cli-override

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/megatron/bridge/recipes/gpt/vanilla_gpt.py (1)
15-29: Recipe name is too generic for repository naming conventions.

Please consider renaming vanilla_gpt.py / vanilla_gpt_pretrain_config to include explicit size/config metadata for easier discovery (e.g., gpt_<size>_vanilla_pretrain).

As per coding guidelines, "Use descriptive recipe names that include the model size and configuration (e.g., llama3_8b.py, qwen2_7b_instruct.py)."

Also applies to: 51-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py` around lines 15 - 29, The
recipe filename and identifier are too generic: rename the file and the recipe
identifier (currently vanilla_gpt.py and vanilla_gpt_pretrain_config) to a
descriptive name that includes model size/config (e.g.,
gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain), then update all
internal references—docstring example, the recipe export/variable name
(vanilla_gpt_pretrain_config), any entries used by run_recipe invocation, and
any import/registration points that reference vanilla_gpt or
vanilla_gpt_pretrain_config—so the repository consistently uses the new
descriptive name.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/training/utils/omegaconf_utils.py`:
- Around line 86-95: str_to_callable currently returns any attribute for dotted
paths even if it's not callable; update the dotted-path branch in
str_to_callable to verify the resolved attribute is callable before returning it
(use importlib.import_module and getattr as already done for module/attr
resolution), and if the attribute is found but not callable raise the same
ValueError used for unknown activation names (including the sorted
ACTIVATION_FUNC_MAP list for the message); ensure the function's return remains
a Callable and that non-callable attributes are rejected consistently with the
existing error path.

---

Nitpick comments:
In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py`:
- Around line 15-29: The recipe filename and identifier are too generic: rename
the file and the recipe identifier (currently vanilla_gpt.py and
vanilla_gpt_pretrain_config) to a descriptive name that includes model
size/config (e.g., gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain),
then update all internal references—docstring example, the recipe
export/variable name (vanilla_gpt_pretrain_config), any entries used by
run_recipe invocation, and any import/registration points that reference
vanilla_gpt or vanilla_gpt_pretrain_config—so the repository consistently uses
the new descriptive name.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4b3d00 and e152402.

📒 Files selected for processing (4)

src/megatron/bridge/models/transformer_config.py
src/megatron/bridge/recipes/gpt/__init__.py
src/megatron/bridge/recipes/gpt/vanilla_gpt.py
src/megatron/bridge/training/utils/omegaconf_utils.py

coderabbitai · 2026-03-03T22:12:45Z

src/megatron/bridge/training/utils/omegaconf_utils.py

+    parts = name.rsplit(".", 1)
+    if len(parts) == 2:
+        try:
+            module = importlib.import_module(parts[0])
+            return getattr(module, parts[1])
+        except (ImportError, AttributeError):
+            pass
+    raise ValueError(
+        f"Unknown activation function: '{name}'. Known names: {sorted(n for n in ACTIVATION_FUNC_MAP if '.' not in n)}"
+    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and get the overall structure git ls-files | grep omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 180

🏁 Script executed:

# Get the full str_to_callable function to understand its current implementation rg -n -B5 -A15 'def str_to_callable' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 794

🏁 Script executed:

# Check the ACTIVATION_FUNC_MAP context to understand the data structure rg -n -B2 -A5 'ACTIVATION_FUNC_MAP' src/megatron/bridge/training/utils/omegaconf_utils.py | head -50

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1470

🏁 Script executed:

# Get a broader view of the file structure around lines 86-95 sed -n '70,105p' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1334

🏁 Script executed:

# Check if there are tests for str_to_callable to understand expected behavior rg -n -B3 -A10 'str_to_callable' tests/unit_tests/training/utils/test_omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

# Look at the full context of lines 86-95 and any callable validation in the function sed -n '74,96p' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 874

🏁 Script executed:

# Check imports at the top of the file to ensure proper typing sed -n '1,30p' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1151

🏁 Script executed:

# Verify the function is used elsewhere in the codebase to understand impact rg -n 'str_to_callable' src/megatron/bridge --type py -B2 -A2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1044

🏁 Script executed:

# Verify the proposed fix would work by checking callable behavior python3 << 'EOF' import importlib # Simulate importing torch.nn.functional to validate the approach module = importlib.import_module("torch.nn.functional") # Test with a callable attribute (should pass) silu_func = getattr(module, "silu") print(f"silu is callable: {callable(silu_func)}") # Test with a non-callable attribute to show the problem # (torch.nn.functional has attributes like __version__) version = getattr(module, "__version__") print(f"__version__ is callable: {callable(version)}") print(f"Type of __version__: {type(version)}") EOF

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1192

🏁 Script executed:

# Verify how the function result is used to understand the impact rg -n -B5 -A5 'activation_func = str_to_callable' src/megatron/bridge

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1053

🏁 Script executed:

# Check if there are any other usages or tests that might reveal expected behavior rg -r 'str_to_callable\|activation_func.*callable' src/ tests/ --type py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

Guard dotted-path resolution so only callables are accepted.

str_to_callable() currently returns any attribute from dotted paths without validation. That allows non-callables through and violates the function's return type contract (Callable) and docstring promise.

🔧 Proposed fix

def str_to_callable(name: str) -> Callable: """Resolve an activation function name to its callable. Accepts short names (``"silu"``), fully qualified names (``"torch.nn.functional.silu"``), or dotted import paths. Raises: ValueError: If the name cannot be resolved. """ if name in ACTIVATION_FUNC_MAP: return ACTIVATION_FUNC_MAP[name] # Fallback: try to import the dotted path parts = name.rsplit(".", 1) if len(parts) == 2: try: module = importlib.import_module(parts[0]) - return getattr(module, parts[1]) + resolved = getattr(module, parts[1]) + if callable(resolved): + return resolved except (ImportError, AttributeError): pass - raise ValueError( - f"Unknown activation function: '{name}'. Known names: {sorted(n for n in ACTIVATION_FUNC_MAP if '.' not in n)}" - ) + known_names = sorted(n for n in ACTIVATION_FUNC_MAP if "." not in n) + raise ValueError(f"Unknown activation function: '{name}'. Known names: {known_names}")

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 93-95: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/training/utils/omegaconf_utils.py` around lines 86 - 95, str_to_callable currently returns any attribute for dotted paths even if it's not callable; update the dotted-path branch in str_to_callable to verify the resolved attribute is callable before returning it (use importlib.import_module and getattr as already done for module/attr resolution), and if the attribute is found but not callable raise the same ValueError used for unknown activation names (including the sorted ACTIVATION_FUNC_MAP list for the message); ensure the function's return remains a Callable and that non-callable attributes are rejected consistently with the existing error path.

Extract duplicated activation-function name<->callable mappings from omegaconf_utils.py and ModelBridge.ACTIVATION_MAPPING into a single shared module (megatron.bridge.utils.activation_map). Both consumers now reference the same registry, keeping CLI overrides, config finalization, and HF<->Megatron conversion in sync. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-03T22:15:21Z

/ok to test 4c4b987

yaoyu-33 · 2026-03-04T04:23:23Z

/ok to test bec6e21

- Add translate_mlm_to_bridge.py for MLM→Bridge and Bridge→MLM translation - Support --reverse flag for Bridge→MLM direction - Build reverse mapping from ARG_MAP + extra tables - Add MLM argparse introspection (optional, try/except guarded) - Handle swiglu, squared_relu, mixed_precision special cases - Output as CLI overrides, standalone recipe, or full torchrun command Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-04T04:28:11Z

/ok to test 38a82f9

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot bot temporarily deployed to test March 3, 2026 22:07 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 22:12 Error

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to test March 3, 2026 22:16 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 22:25 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 22:35 Failure

copy-pr-bot bot temporarily deployed to test March 4, 2026 04:24 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 4, 2026 04:27 Error

yaoyu-33 force-pushed the yuya/feat-activation-func-cli-override branch from bec6e21 to 38a82f9 Compare March 4, 2026 04:27

yaoyu-33 changed the title ~~[training, model, recipe] feat: Allow activation_func to be set via CLI string override~~ [misc] feat: Add bidirectional MLM <-> MBridge config translation script Mar 4, 2026

copy-pr-bot bot temporarily deployed to test March 4, 2026 04:29 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 4, 2026 04:57 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 4, 2026 05:04 Failure

update

fcb73ac

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630

[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630
yaoyu-33 wants to merge 4 commits intomainfrom
yuya/feat-activation-func-cli-override

yaoyu-33 commented Mar 3, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

yaoyu-33 commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

yaoyu-33 commented Mar 3, 2026

Uh oh!

yaoyu-33 commented Mar 4, 2026

Uh oh!

yaoyu-33 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yaoyu-33 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

MLM → Bridge Translation

Bridge → MLM Translation

Key mappings handled

Test plan

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

yaoyu-33 commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented Mar 3, 2026

Uh oh!

yaoyu-33 commented Mar 4, 2026

Uh oh!

yaoyu-33 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yaoyu-33 commented Mar 3, 2026 •

edited

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading