Skip to content

[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630

Open
yaoyu-33 wants to merge 4 commits intomainfrom
yuya/feat-activation-func-cli-override
Open

[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630
yaoyu-33 wants to merge 4 commits intomainfrom
yuya/feat-activation-func-cli-override

Conversation

@yaoyu-33
Copy link
Contributor

@yaoyu-33 yaoyu-33 commented Mar 3, 2026

Summary

  • Add scripts/translate_mlm_to_bridge.py — a bidirectional translation script that converts between Megatron-LM (pretrain_gpt.py) CLI arguments and Megatron Bridge (run_recipe.py) Hydra-style overrides.
  • Supports MLM→Bridge (default) and Bridge→MLM (--reverse) directions.
  • Outputs as CLI overrides, standalone recipe file, or full torchrun command in both directions.

Motivation

Running correlation tests between Megatron-LM and Megatron Bridge requires matching configurations precisely across two very different config systems (argparse flags vs Hydra overrides on dataclass recipes). This script automates the translation, eliminating manual errors and making it easy to verify that both frameworks receive identical model/training parameters.

MLM → Bridge Translation

Translates Megatron-LM pretrain_gpt.py arguments (from YAML or CLI) into Bridge recipe overrides:

# From a YAML config file
python scripts/translate_mlm_to_bridge.py config.yaml

# From inline args
python scripts/translate_mlm_to_bridge.py -- --num-layers 32 --hidden-size 4096 --bf16

# Emit a full torchrun command
python scripts/translate_mlm_to_bridge.py config.yaml --format command --nproc 8

Bridge → MLM Translation

Translates Bridge Hydra overrides into Megatron-LM CLI arguments:

# From CLI override string
python scripts/translate_mlm_to_bridge.py --reverse \
  "model.num_layers=32 model.hidden_size=4096 mixed_precision=bf16_mixed"

# From a Bridge YAML/override file
python scripts/translate_mlm_to_bridge.py --reverse bridge_config.yaml

# Emit a full torchrun command for pretrain_gpt.py
python scripts/translate_mlm_to_bridge.py --reverse \
  --format command --nproc 8 \
  "model.num_layers=32 model.hidden_size=4096"

Key mappings handled

Category Examples
Model architecture num_layers, hidden_size, num_attention_heads, ffn_hidden_size, normalization, position_embedding_type, rotary_base
Activation swigluactivation_func=silu + gated_linear_unit=true, squared-reluactivation_func=squared_relu
Training micro_batch_size, global_batch_size, train_iters, lr, weight_decay, clip_grad, seed
Parallelism tensor_model_parallel_size, pipeline_model_parallel_size, context_parallel_size, sequence_parallel, use_distributed_optimizer
Precision bf16bf16_mixed, fp16fp16_mixed
Tokenizer tokenizer_type, tokenizer_model, vocab_size

Test plan

  • Existing unit tests pass
  • CI triggered with /ok to test
  • Verified round-trip translation preserves key parameters
  • Verified loss correlation between MLM and Bridge with matched parameters using this script

…LI string override

Enable `model.activation_func=silu` (or `gelu`, `relu`, etc.) as a Hydra
CLI override by serializing known activation functions as strings during
OmegaConf conversion and resolving them back to callables in
TransformerConfig.finalize().

Changes:
- Add ACTIVATION_FUNC_MAP and str_to_callable/callable_to_str helpers
  in omegaconf_utils.py
- Serialize activation_func as a string instead of excluding it from
  OmegaConf, so Hydra overrides work
- Resolve string activation_func in TransformerConfig.finalize(),
  MLATransformerConfig.finalize(), and
  HeterogeneousTransformerConfig.finalize() before MCore post-init
- Add vanilla_gpt_pretrain_config recipe for MLM<->Bridge correlation
  testing with minimal defaults

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33
Copy link
Contributor Author

yaoyu-33 commented Mar 3, 2026

/ok to test e152402

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

These changes introduce infrastructure for handling string-based activation functions in configuration systems. A new vanilla GPT pretraining recipe is added, along with utilities to convert between string representations and callable activation functions for serialization and deserialization across config variants.

Changes

Cohort / File(s) Summary
Activation Function Handling
src/megatron/bridge/training/utils/omegaconf_utils.py, src/megatron/bridge/models/transformer_config.py
Introduces bidirectional mapping between string activation function names and callables via ACTIVATION_FUNC_MAP, callable_to_str, and str_to_callable. Applies string-to-callable resolution in finalize() workflows to ensure runtime callables are consistently instantiated.
Vanilla GPT Recipe
src/megatron/bridge/recipes/gpt/vanilla_gpt.py, src/megatron/bridge/recipes/gpt/__init__.py
New vanilla GPT pretraining recipe module providing vanilla_gpt_pretrain_config() that builds Megatron-LM compatible configuration with defaults for model, training, validation, optimizer, dataset, logging, checkpointing, and distributed settings.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/YAML Config
    participant OmegaConf
    participant StrToCallable as str_to_callable()
    participant TransformerConfig
    participant Resolve as _resolve_string_fields()
    participant Runtime

    User->>OmegaConf: Provide activation_func: "gelu"
    OmegaConf->>StrToCallable: Convert string to callable
    StrToCallable->>StrToCallable: Look up in ACTIVATION_FUNC_MAP
    StrToCallable-->>OmegaConf: Return F.gelu function
    OmegaConf->>TransformerConfig: Pass config with callable
    TransformerConfig->>Resolve: finalize() calls _resolve_string_fields()
    Resolve->>Resolve: Resolve any remaining string activation_funcs
    Resolve-->>TransformerConfig: String converted to callable
    TransformerConfig-->>Runtime: Config ready with resolved activation_func
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR contains major changes to activation function serialization and configuration finalization, but test plan lists critical tests as remaining incomplete: existing recipes verification and unit tests for round-trip conversions. Complete remaining tests (verify Llama/Qwen/DeepSeek recipes, add unit tests for str_to_callable/callable_to_str) and document all results including edge cases before merging.
Title check ⚠️ Warning The PR title '[misc] feat: Add bidirectional MLM <-> MBridge config translation script' does not match the actual changes, which focus on enabling activation_func CLI string overrides and adding a vanilla GPT recipe. Update the PR title to reflect the primary change: something like '[training, model, recipe] feat: Allow activation_func to be set via CLI string override' would be more accurate.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yuya/feat-activation-func-cli-override

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/megatron/bridge/recipes/gpt/vanilla_gpt.py (1)

15-29: Recipe name is too generic for repository naming conventions.

Please consider renaming vanilla_gpt.py / vanilla_gpt_pretrain_config to include explicit size/config metadata for easier discovery (e.g., gpt_<size>_vanilla_pretrain).

As per coding guidelines, "Use descriptive recipe names that include the model size and configuration (e.g., llama3_8b.py, qwen2_7b_instruct.py)."

Also applies to: 51-52

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py` around lines 15 - 29, The
recipe filename and identifier are too generic: rename the file and the recipe
identifier (currently vanilla_gpt.py and vanilla_gpt_pretrain_config) to a
descriptive name that includes model size/config (e.g.,
gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain), then update all
internal references—docstring example, the recipe export/variable name
(vanilla_gpt_pretrain_config), any entries used by run_recipe invocation, and
any import/registration points that reference vanilla_gpt or
vanilla_gpt_pretrain_config—so the repository consistently uses the new
descriptive name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/training/utils/omegaconf_utils.py`:
- Around line 86-95: str_to_callable currently returns any attribute for dotted
paths even if it's not callable; update the dotted-path branch in
str_to_callable to verify the resolved attribute is callable before returning it
(use importlib.import_module and getattr as already done for module/attr
resolution), and if the attribute is found but not callable raise the same
ValueError used for unknown activation names (including the sorted
ACTIVATION_FUNC_MAP list for the message); ensure the function's return remains
a Callable and that non-callable attributes are rejected consistently with the
existing error path.

---

Nitpick comments:
In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py`:
- Around line 15-29: The recipe filename and identifier are too generic: rename
the file and the recipe identifier (currently vanilla_gpt.py and
vanilla_gpt_pretrain_config) to a descriptive name that includes model
size/config (e.g., gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain),
then update all internal references—docstring example, the recipe
export/variable name (vanilla_gpt_pretrain_config), any entries used by
run_recipe invocation, and any import/registration points that reference
vanilla_gpt or vanilla_gpt_pretrain_config—so the repository consistently uses
the new descriptive name.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4b3d00 and e152402.

📒 Files selected for processing (4)
  • src/megatron/bridge/models/transformer_config.py
  • src/megatron/bridge/recipes/gpt/__init__.py
  • src/megatron/bridge/recipes/gpt/vanilla_gpt.py
  • src/megatron/bridge/training/utils/omegaconf_utils.py

Comment on lines +86 to +95
parts = name.rsplit(".", 1)
if len(parts) == 2:
try:
module = importlib.import_module(parts[0])
return getattr(module, parts[1])
except (ImportError, AttributeError):
pass
raise ValueError(
f"Unknown activation function: '{name}'. Known names: {sorted(n for n in ACTIVATION_FUNC_MAP if '.' not in n)}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and get the overall structure
git ls-files | grep omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 180


🏁 Script executed:

# Get the full str_to_callable function to understand its current implementation
rg -n -B5 -A15 'def str_to_callable' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 794


🏁 Script executed:

# Check the ACTIVATION_FUNC_MAP context to understand the data structure
rg -n -B2 -A5 'ACTIVATION_FUNC_MAP' src/megatron/bridge/training/utils/omegaconf_utils.py | head -50

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1470


🏁 Script executed:

# Get a broader view of the file structure around lines 86-95
sed -n '70,105p' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1334


🏁 Script executed:

# Check if there are tests for str_to_callable to understand expected behavior
rg -n -B3 -A10 'str_to_callable' tests/unit_tests/training/utils/test_omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53


🏁 Script executed:

# Look at the full context of lines 86-95 and any callable validation in the function
sed -n '74,96p' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 874


🏁 Script executed:

# Check imports at the top of the file to ensure proper typing
sed -n '1,30p' src/megatron/bridge/training/utils/omegaconf_utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1151


🏁 Script executed:

# Verify the function is used elsewhere in the codebase to understand impact
rg -n 'str_to_callable' src/megatron/bridge --type py -B2 -A2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1044


🏁 Script executed:

# Verify the proposed fix would work by checking callable behavior
python3 << 'EOF'
import importlib

# Simulate importing torch.nn.functional to validate the approach
module = importlib.import_module("torch.nn.functional")

# Test with a callable attribute (should pass)
silu_func = getattr(module, "silu")
print(f"silu is callable: {callable(silu_func)}")

# Test with a non-callable attribute to show the problem
# (torch.nn.functional has attributes like __version__)
version = getattr(module, "__version__")
print(f"__version__ is callable: {callable(version)}")
print(f"Type of __version__: {type(version)}")
EOF

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1192


🏁 Script executed:

# Verify how the function result is used to understand the impact
rg -n -B5 -A5 'activation_func = str_to_callable' src/megatron/bridge

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1053


🏁 Script executed:

# Check if there are any other usages or tests that might reveal expected behavior
rg -r 'str_to_callable\|activation_func.*callable' src/ tests/ --type py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53


Guard dotted-path resolution so only callables are accepted.

str_to_callable() currently returns any attribute from dotted paths without validation. That allows non-callables through and violates the function's return type contract (Callable) and docstring promise.

🔧 Proposed fix
 def str_to_callable(name: str) -> Callable:
     """Resolve an activation function name to its callable.

     Accepts short names (``"silu"``), fully qualified names
     (``"torch.nn.functional.silu"``), or dotted import paths.

     Raises:
         ValueError: If the name cannot be resolved.
     """
     if name in ACTIVATION_FUNC_MAP:
         return ACTIVATION_FUNC_MAP[name]
     # Fallback: try to import the dotted path
     parts = name.rsplit(".", 1)
     if len(parts) == 2:
         try:
             module = importlib.import_module(parts[0])
-            return getattr(module, parts[1])
+            resolved = getattr(module, parts[1])
+            if callable(resolved):
+                return resolved
         except (ImportError, AttributeError):
             pass
-    raise ValueError(
-        f"Unknown activation function: '{name}'. Known names: {sorted(n for n in ACTIVATION_FUNC_MAP if '.' not in n)}"
-    )
+    known_names = sorted(n for n in ACTIVATION_FUNC_MAP if "." not in n)
+    raise ValueError(f"Unknown activation function: '{name}'. Known names: {known_names}")
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 93-95: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/training/utils/omegaconf_utils.py` around lines 86 - 95,
str_to_callable currently returns any attribute for dotted paths even if it's
not callable; update the dotted-path branch in str_to_callable to verify the
resolved attribute is callable before returning it (use importlib.import_module
and getattr as already done for module/attr resolution), and if the attribute is
found but not callable raise the same ValueError used for unknown activation
names (including the sorted ACTIVATION_FUNC_MAP list for the message); ensure
the function's return remains a Callable and that non-callable attributes are
rejected consistently with the existing error path.

Extract duplicated activation-function name<->callable mappings from
omegaconf_utils.py and ModelBridge.ACTIVATION_MAPPING into a single
shared module (megatron.bridge.utils.activation_map).  Both consumers
now reference the same registry, keeping CLI overrides, config
finalization, and HF<->Megatron conversion in sync.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33
Copy link
Contributor Author

yaoyu-33 commented Mar 3, 2026

/ok to test 4c4b987

@yaoyu-33
Copy link
Contributor Author

yaoyu-33 commented Mar 4, 2026

/ok to test bec6e21

- Add translate_mlm_to_bridge.py for MLM→Bridge and Bridge→MLM translation
- Support --reverse flag for Bridge→MLM direction
- Build reverse mapping from ARG_MAP + extra tables
- Add MLM argparse introspection (optional, try/except guarded)
- Handle swiglu, squared_relu, mixed_precision special cases
- Output as CLI overrides, standalone recipe, or full torchrun command

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33 yaoyu-33 force-pushed the yuya/feat-activation-func-cli-override branch from bec6e21 to 38a82f9 Compare March 4, 2026 04:27
@yaoyu-33 yaoyu-33 changed the title [training, model, recipe] feat: Allow activation_func to be set via CLI string override [misc] feat: Add bidirectional MLM <-> MBridge config translation script Mar 4, 2026
@yaoyu-33
Copy link
Contributor Author

yaoyu-33 commented Mar 4, 2026

/ok to test 38a82f9

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant