[Fix] Fix `freeze_moe_router` crash and support DeepSeekV3 `expert_bias` #70

spacegoing · 2026-01-25T12:00:09Z

1. DeepSeekV3 / Moonlight Architecture Context

The DeepSeekV3 (and its smaller variant Moonlight) architecture differs from standard MoE models in how it handles the router (gate) layer:

Standard Linear Router: Typically nn.Linear(hidden_size, num_experts, bias=False).
Score Correction Bias: Instead of a standard additive bias in the linear layer, DeepSeekV3 introduces a separate parameter often termed e_score_correction_bias in Hugging Face.
- HF Structure: model.layers[i].mlp.gate (Linear, no bias) + e_score_correction_bias.
- Megatron-Core Mapping (via MBridge): MBridge maps e_score_correction_bias to mlp.router.expert_bias. The standard mlp.router.bias exists on the object attribute but is initialized to None.

2. Execution Trace & Root Cause

The bridge (e.g., DeepseekV3Bridge) instantiates the model via get_model. inside mbridge/core/bridge.py and mbridge/models/deepseek_v3.py.
The GPTModel is created, and MCore initializes the TopKRouter.

Crucial Detail: In MCore, TopKRouter attributes are often defined even if unused. For DeepSeekV3, self.bias is defined as an attribute but set to None.

Callback Execution (The Crash)

After model creation, the callbacks are executed.
In mbridge/utils/post_creation_callbacks.py:

def freeze_moe_router(model, ...):
    for layer in model.decoder.layers:
        if hasattr(layer.mlp, "router"):
            # ...
            if hasattr(layer.mlp.router, "bias"):
                # CRASH HERE: layer.mlp.router.bias is None!
                layer.mlp.router.bias.requires_grad = False

Why it crashes: hasattr(obj, "bias") returns True even if obj.bias is None. Trying to access .requires_grad on NoneType raises AttributeError.

Secondary Issue: The original code completely missed expert_bias, meaning even if it didn't crash, the router wouldn't be fully frozen for DeepSeekV3 models.

3. Solution

The fix makes the callback robust by explicitly checking for None values and expanding the attribute list to include expert_bias.

Code Changes

Modified mbridge/utils/post_creation_callbacks.py:

def freeze_moe_router(model, pre_process, post_process, config, hf_config):
    for layer in model.decoder.layers:
        if hasattr(layer.mlp, "router"):
            router = layer.mlp.router
            # 1. Added 'expert_bias' to support DeepSeekV3
            # 2. Iterate safely over potential attributes
            for attr in ["weight", "bias", "expert_bias"]:
                param = getattr(router, attr, None)
                # 3. Explicit check prevents crash on None parameters
                if param is not None:
                    param.requires_grad = False
        
        # Similar logic applied to shared_experts
        if hasattr(layer.mlp, "shared_experts"):
            shared_experts = layer.mlp.shared_experts
            for attr in ["gate_weight", "gate_bias"]:
                param = getattr(shared_experts, attr, None)
                if param is not None:
                    param.requires_grad = False

[Fix] fix freeze_moe_router access None attr

384eafa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix `freeze_moe_router` crash and support DeepSeekV3 `expert_bias` #70

[Fix] Fix `freeze_moe_router` crash and support DeepSeekV3 `expert_bias` #70

spacegoing commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Fix] Fix freeze_moe_router crash and support DeepSeekV3 expert_bias #70

Are you sure you want to change the base?

[Fix] Fix freeze_moe_router crash and support DeepSeekV3 expert_bias #70

Conversation

spacegoing commented Jan 25, 2026

1. DeepSeekV3 / Moonlight Architecture Context

2. Execution Trace & Root Cause

Callback Execution (The Crash)

3. Solution

Code Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Fix] Fix `freeze_moe_router` crash and support DeepSeekV3 `expert_bias` #70

[Fix] Fix `freeze_moe_router` crash and support DeepSeekV3 `expert_bias` #70