[ROCm][Bugfix] Fix MXFP4 MoE emulate fallback logic on MX-capable hardware by ChuanLi1101 · Pull Request #36422 · vllm-project/vllm

ChuanLi1101 · 2026-03-08T22:58:03Z

Summary

Fix a Boolean logic regression in QuarkOCP_MX_MoEMethod.__init__ that prevented fallback to emulation mode on MI350X (gfx950) and other MX-capable hardware, causing gibberish output when AITER CK kernels are incompatible (e.g. ROCm version mismatch between quantization-time and serving-time).

Root Cause

The emulate dispatch logic introduced in PR #29008 had a Boolean expression that always evaluated to False on MX-capable hardware with w_mxfp4 schemes:

# OLD (buggy):
self.emulate = (
    not current_platform.supports_mx()           # False on MI350X
    or not self.ocp_mx_scheme.startswith("w_mxfp4")  # False for w_mxfp4_*
) and (self.mxfp4_backend is None or not self.use_rocm_aiter_moe)
# => (False or False) and (...) => False — always!

This made it impossible to fall back to emulation, even when:

AITER CK kernels are incompatible (ROCm version mismatch)
The user explicitly sets VLLM_ROCM_USE_AITER_MOE=0

Fix

Restructured the logic to be explicit and correct:

# NEW:
can_use_native_ck = (
    current_platform.supports_mx()
    and self.ocp_mx_scheme is not None
    and self.ocp_mx_scheme.startswith("w_mxfp4")
    and self.use_rocm_aiter_moe
)
can_use_mxfp4_backend = self.mxfp4_backend is not None
self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)

Additional changes

AITER version logging: Logs AITER version during weight processing to aid debugging version-mismatch issues
Improved warning message: Added workaround hint (VLLM_ROCM_USE_AITER_MOE=0) to emulation warning
Unit test: Added parametrized test covering 14 cases across the full dispatch matrix (hardware × scheme × AITER × backend). No GPU required, runs in <0.2s.

Fixes #36337

Test plan

Unit test test_quark_moe_emulate.py passes (14/14 cases, 0.13s, no GPU needed)
Verify amd/Kimi-K2.5-MXFP4 on MI350X with VLLM_ROCM_USE_AITER_MOE=0 produces coherent output (emulation path)
Verify native CK path still works when AITER version is compatible
Existing MXFP4 tests remain green

mergify · 2026-03-08T23:02:48Z

Hi @ChuanLi1101, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

ChuanLi1101 · 2026-03-08T23:03:01Z

cc @zejunchen-zejun @dllehr-amd @fxmarty-amd @maleksan85 — would appreciate a review since this touches the MXFP4 MoE emulate dispatch logic you've worked on. The fix is a one-line Boolean logic change + unit test.

gemini-code-assist

Code Review

This pull request correctly fixes a boolean logic regression that prevented fallback to emulation mode for MXFP4 MoE on MX-capable hardware. The new logic is clearer and more robust. The addition of comprehensive unit tests is excellent and ensures the correctness of the fix across various configurations. I've added one comment regarding exception handling for improved robustness.

gemini-code-assist · 2026-03-08T23:09:34Z

vllm/model_executor/layers/quantization/quark/quark_moe.py

+        except Exception:
+            aiter_version = "unknown"


Using a broad except Exception: can mask underlying issues. For example, if the aiter module is present but fails to import due to an internal error (other than ImportError), this exception will be silently caught, making debugging more difficult. It's better to be more specific and catch only the expected ImportError.

Suggested change

except Exception:

aiter_version = "unknown"

except ImportError:

aiter_version = "unknown"

ChuanLi1101 · 2026-03-08T23:28:02Z

Also cc @ganyi1996ppo @wuhuikx for review.

fxmarty-amd

LGTM. I think the test might be slightly overkill

fxmarty-amd · 2026-03-09T09:46:21Z

vllm/model_executor/layers/quantization/quark/quark_moe.py

+            and self.ocp_mx_scheme.startswith("w_mxfp4")
+            and self.use_rocm_aiter_moe


Suggested change

and self.ocp_mx_scheme.startswith("w_mxfp4")

and self.use_rocm_aiter_moe

and self.ocp_mx_scheme.startswith("w_mxfp4")

and self.ocp_mx_scheme.endswith("a_mxfp4")

and self.use_rocm_aiter_moe

Fixes the same bug as #35855 (comment), introduced in introduced in #29008 (https://github.com/vllm-project/vllm/pull/29008/changes#r2877732813).

BowenBao

LGTM

BowenBao · 2026-03-09T18:40:17Z

vllm/model_executor/layers/quantization/quark/quark_moe.py

+        )
+        can_use_mxfp4_backend = self.mxfp4_backend is not None
+
+        self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)


thanks, fyi I plan to refactor emulate into a backend after #34285 is landed, so the logic can be merged and cleaned up.

…dware Fix a Boolean logic regression in QuarkOCP_MX_MoEMethod that prevented fallback to emulation mode on MI350X (gfx950) and other MX-capable hardware, causing gibberish output when AITER CK kernels are incompatible (e.g. ROCm version mismatch). The previous logic: emulate = (not supports_mx() or not scheme.startswith("w_mxfp4")) and (backend is None or not use_aiter_moe) On MI350X with w_mxfp4, the first clause is (False or False) = False, making the entire AND expression always False regardless of whether AITER is available. This silently disabled the emulation fallback and ignored VLLM_ROCM_USE_AITER_MOE=0. The fix restructures the logic to be explicit: can_use_native_ck = supports_mx and w_mxfp4 and aiter_enabled can_use_backend = backend is not None emulate = not (can_use_native_ck or can_use_backend) Also adds: - AITER version logging for easier debugging - Workaround hint in the emulation warning message - Parametrized unit test covering the full dispatch matrix (14 cases) Fixes vllm-project#36337 Made-with: Cursor Signed-off-by: Li <chuali@amd.com> Made-with: Cursor

dllehr-amd · 2026-03-10T16:52:51Z

vllm/model_executor/layers/quantization/quark/quark_moe.py


        from aiter.utility.fp4_utils import e8m0_shuffle

+        try:


Can we move this under https://github.com/vllm-project/vllm/blob/main/vllm/_aiter_ops.py maybe? pulling the version seems like something we probably want to have more generally available?

ChuanLi1101 requested a review from tjtanaa as a code owner March 8, 2026 22:58

mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Mar 8, 2026

ChuanLi1101 mentioned this pull request Mar 8, 2026

[Bug]: Kimi-K2.5-MXFP4 produces gibberish output on MI350X (gfx950) with ROCm 7.2 #36337

Open

2 tasks

github-project-automation bot added this to AMD Mar 8, 2026

github-project-automation bot moved this to Todo in AMD Mar 8, 2026

gemini-code-assist bot reviewed Mar 8, 2026

View reviewed changes

ChuanLi1101 force-pushed the fix/mxfp4-moe-emulate-logic branch from e2e0d6f to daa055f Compare March 8, 2026 23:27

fxmarty-amd approved these changes Mar 9, 2026

View reviewed changes

BowenBao approved these changes Mar 9, 2026

View reviewed changes

BowenBao mentioned this pull request Mar 9, 2026

[Bugfix] Fix issues in quark emulative logic #36486

Closed

ChuanLi1101 force-pushed the fix/mxfp4-moe-emulate-logic branch from daa055f to fcf215e Compare March 9, 2026 23:16

Merge branch 'main' into fix/mxfp4-moe-emulate-logic

a049de2

dllehr-amd reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Bugfix] Fix MXFP4 MoE emulate fallback logic on MX-capable hardware#36422

[ROCm][Bugfix] Fix MXFP4 MoE emulate fallback logic on MX-capable hardware#36422
ChuanLi1101 wants to merge 2 commits intovllm-project:mainfrom
ChuanLi1101:fix/mxfp4-moe-emulate-logic

ChuanLi1101 commented Mar 8, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Mar 8, 2026

Uh oh!

ChuanLi1101 commented Mar 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Uh oh!

ChuanLi1101 commented Mar 8, 2026

Uh oh!

fxmarty-amd left a comment

Uh oh!

fxmarty-amd Mar 9, 2026

Uh oh!

BowenBao left a comment

Uh oh!

BowenBao Mar 9, 2026

Uh oh!

dllehr-amd Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		and self.ocp_mx_scheme.startswith("w_mxfp4")
		and self.use_rocm_aiter_moe

Uh oh!

Conversation

ChuanLi1101 commented Mar 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Additional changes

Test plan

Uh oh!

mergify bot commented Mar 8, 2026

Uh oh!

ChuanLi1101 commented Mar 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

ChuanLi1101 commented Mar 8, 2026

Uh oh!

fxmarty-amd left a comment

Choose a reason for hiding this comment

Uh oh!

fxmarty-amd Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

BowenBao left a comment

Choose a reason for hiding this comment

Uh oh!

BowenBao Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

dllehr-amd Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ChuanLi1101 commented Mar 8, 2026 •

edited by github-actions bot

Loading