[Fix] Fix freeze_moe_router crash and support DeepSeekV3 expert_bias
#70
+10
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1. DeepSeekV3 / Moonlight Architecture Context
The DeepSeekV3 (and its smaller variant Moonlight) architecture differs from standard MoE models in how it handles the router (gate) layer:
nn.Linear(hidden_size, num_experts, bias=False).e_score_correction_biasin Hugging Face.model.layers[i].mlp.gate(Linear, no bias) +e_score_correction_bias.e_score_correction_biastomlp.router.expert_bias. The standardmlp.router.biasexists on the object attribute but is initialized toNone.2. Execution Trace & Root Cause
The bridge (e.g.,
DeepseekV3Bridge) instantiates the model viaget_model. insidembridge/core/bridge.pyandmbridge/models/deepseek_v3.py.The
GPTModelis created, and MCore initializes theTopKRouter.TopKRouterattributes are often defined even if unused. For DeepSeekV3,self.biasis defined as an attribute but set toNone.Callback Execution (The Crash)
After model creation, the callbacks are executed.
In
mbridge/utils/post_creation_callbacks.py:Why it crashes:
hasattr(obj, "bias")returnsTrueeven ifobj.biasisNone. Trying to access.requires_gradonNoneTyperaisesAttributeError.Secondary Issue: The original code completely missed
expert_bias, meaning even if it didn't crash, the router wouldn't be fully frozen for DeepSeekV3 models.3. Solution
The fix makes the callback robust by explicitly checking for
Nonevalues and expanding the attribute list to includeexpert_bias.Code Changes
Modified
mbridge/utils/post_creation_callbacks.py: