[ROCm][Bugfix] Fix MXFP4 MoE emulate fallback logic on MX-capable hardware#36422
[ROCm][Bugfix] Fix MXFP4 MoE emulate fallback logic on MX-capable hardware#36422ChuanLi1101 wants to merge 2 commits intovllm-project:mainfrom
Conversation
|
Hi @ChuanLi1101, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
cc @zejunchen-zejun @dllehr-amd @fxmarty-amd @maleksan85 — would appreciate a review since this touches the MXFP4 MoE emulate dispatch logic you've worked on. The fix is a one-line Boolean logic change + unit test. |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a boolean logic regression that prevented fallback to emulation mode for MXFP4 MoE on MX-capable hardware. The new logic is clearer and more robust. The addition of comprehensive unit tests is excellent and ensures the correctness of the fix across various configurations. I've added one comment regarding exception handling for improved robustness.
| except Exception: | ||
| aiter_version = "unknown" |
There was a problem hiding this comment.
Using a broad except Exception: can mask underlying issues. For example, if the aiter module is present but fails to import due to an internal error (other than ImportError), this exception will be silently caught, making debugging more difficult. It's better to be more specific and catch only the expected ImportError.
| except Exception: | |
| aiter_version = "unknown" | |
| except ImportError: | |
| aiter_version = "unknown" |
e2e0d6f to
daa055f
Compare
|
Also cc @ganyi1996ppo @wuhuikx for review. |
fxmarty-amd
left a comment
There was a problem hiding this comment.
LGTM. I think the test might be slightly overkill
| and self.ocp_mx_scheme.startswith("w_mxfp4") | ||
| and self.use_rocm_aiter_moe |
There was a problem hiding this comment.
| and self.ocp_mx_scheme.startswith("w_mxfp4") | |
| and self.use_rocm_aiter_moe | |
| and self.ocp_mx_scheme.startswith("w_mxfp4") | |
| and self.ocp_mx_scheme.endswith("a_mxfp4") | |
| and self.use_rocm_aiter_moe |
Fixes the same bug as #35855 (comment), introduced in introduced in #29008 (https://github.com/vllm-project/vllm/pull/29008/changes#r2877732813).
| ) | ||
| can_use_mxfp4_backend = self.mxfp4_backend is not None | ||
|
|
||
| self.emulate = not (can_use_native_ck or can_use_mxfp4_backend) |
There was a problem hiding this comment.
thanks, fyi I plan to refactor emulate into a backend after #34285 is landed, so the logic can be merged and cleaned up.
…dware
Fix a Boolean logic regression in QuarkOCP_MX_MoEMethod that prevented
fallback to emulation mode on MI350X (gfx950) and other MX-capable
hardware, causing gibberish output when AITER CK kernels are incompatible
(e.g. ROCm version mismatch).
The previous logic:
emulate = (not supports_mx() or not scheme.startswith("w_mxfp4"))
and (backend is None or not use_aiter_moe)
On MI350X with w_mxfp4, the first clause is (False or False) = False,
making the entire AND expression always False regardless of whether
AITER is available. This silently disabled the emulation fallback and
ignored VLLM_ROCM_USE_AITER_MOE=0.
The fix restructures the logic to be explicit:
can_use_native_ck = supports_mx and w_mxfp4 and aiter_enabled
can_use_backend = backend is not None
emulate = not (can_use_native_ck or can_use_backend)
Also adds:
- AITER version logging for easier debugging
- Workaround hint in the emulation warning message
- Parametrized unit test covering the full dispatch matrix (14 cases)
Fixes vllm-project#36337
Made-with: Cursor
Signed-off-by: Li <chuali@amd.com>
Made-with: Cursor
daa055f to
fcf215e
Compare
|
|
||
| from aiter.utility.fp4_utils import e8m0_shuffle | ||
|
|
||
| try: |
There was a problem hiding this comment.
Can we move this under https://github.com/vllm-project/vllm/blob/main/vllm/_aiter_ops.py maybe? pulling the version seems like something we probably want to have more generally available?
Summary
Fix a Boolean logic regression in
QuarkOCP_MX_MoEMethod.__init__that prevented fallback to emulation mode on MI350X (gfx950) and other MX-capable hardware, causing gibberish output when AITER CK kernels are incompatible (e.g. ROCm version mismatch between quantization-time and serving-time).Root Cause
The emulate dispatch logic introduced in PR #29008 had a Boolean expression that always evaluated to
Falseon MX-capable hardware withw_mxfp4schemes:This made it impossible to fall back to emulation, even when:
VLLM_ROCM_USE_AITER_MOE=0Fix
Restructured the logic to be explicit and correct:
Additional changes
VLLM_ROCM_USE_AITER_MOE=0) to emulation warningFixes #36337
Test plan
test_quark_moe_emulate.pypasses (14/14 cases, 0.13s, no GPU needed)amd/Kimi-K2.5-MXFP4on MI350X withVLLM_ROCM_USE_AITER_MOE=0produces coherent output (emulation path)