Skip to content

[ROCm][Bugfix] Fix MXFP4 MoE emulate fallback logic on MX-capable hardware#36422

Open
ChuanLi1101 wants to merge 2 commits intovllm-project:mainfrom
ChuanLi1101:fix/mxfp4-moe-emulate-logic
Open

[ROCm][Bugfix] Fix MXFP4 MoE emulate fallback logic on MX-capable hardware#36422
ChuanLi1101 wants to merge 2 commits intovllm-project:mainfrom
ChuanLi1101:fix/mxfp4-moe-emulate-logic

Conversation

@ChuanLi1101
Copy link
Contributor

@ChuanLi1101 ChuanLi1101 commented Mar 8, 2026

Summary

Fix a Boolean logic regression in QuarkOCP_MX_MoEMethod.__init__ that prevented fallback to emulation mode on MI350X (gfx950) and other MX-capable hardware, causing gibberish output when AITER CK kernels are incompatible (e.g. ROCm version mismatch between quantization-time and serving-time).

Root Cause

The emulate dispatch logic introduced in PR #29008 had a Boolean expression that always evaluated to False on MX-capable hardware with w_mxfp4 schemes:

# OLD (buggy):
self.emulate = (
    not current_platform.supports_mx()           # False on MI350X
    or not self.ocp_mx_scheme.startswith("w_mxfp4")  # False for w_mxfp4_*
) and (self.mxfp4_backend is None or not self.use_rocm_aiter_moe)
# => (False or False) and (...) => False — always!

This made it impossible to fall back to emulation, even when:

  • AITER CK kernels are incompatible (ROCm version mismatch)
  • The user explicitly sets VLLM_ROCM_USE_AITER_MOE=0

Fix

Restructured the logic to be explicit and correct:

# NEW:
can_use_native_ck = (
    current_platform.supports_mx()
    and self.ocp_mx_scheme is not None
    and self.ocp_mx_scheme.startswith("w_mxfp4")
    and self.use_rocm_aiter_moe
)
can_use_mxfp4_backend = self.mxfp4_backend is not None
self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)

Additional changes

  • AITER version logging: Logs AITER version during weight processing to aid debugging version-mismatch issues
  • Improved warning message: Added workaround hint (VLLM_ROCM_USE_AITER_MOE=0) to emulation warning
  • Unit test: Added parametrized test covering 14 cases across the full dispatch matrix (hardware × scheme × AITER × backend). No GPU required, runs in <0.2s.

Fixes #36337

Test plan

  • Unit test test_quark_moe_emulate.py passes (14/14 cases, 0.13s, no GPU needed)
  • Verify amd/Kimi-K2.5-MXFP4 on MI350X with VLLM_ROCM_USE_AITER_MOE=0 produces coherent output (emulation path)
  • Verify native CK path still works when AITER version is compatible
  • Existing MXFP4 tests remain green

@ChuanLi1101 ChuanLi1101 requested a review from tjtanaa as a code owner March 8, 2026 22:58
@mergify mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Mar 8, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 8, 2026
@mergify
Copy link

mergify bot commented Mar 8, 2026

Hi @ChuanLi1101, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@ChuanLi1101
Copy link
Contributor Author

cc @zejunchen-zejun @dllehr-amd @fxmarty-amd @maleksan85 — would appreciate a review since this touches the MXFP4 MoE emulate dispatch logic you've worked on. The fix is a one-line Boolean logic change + unit test.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a boolean logic regression that prevented fallback to emulation mode for MXFP4 MoE on MX-capable hardware. The new logic is clearer and more robust. The addition of comprehensive unit tests is excellent and ensures the correctness of the fix across various configurations. I've added one comment regarding exception handling for improved robustness.

Comment on lines +983 to +984
except Exception:
aiter_version = "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a broad except Exception: can mask underlying issues. For example, if the aiter module is present but fails to import due to an internal error (other than ImportError), this exception will be silently caught, making debugging more difficult. It's better to be more specific and catch only the expected ImportError.

Suggested change
except Exception:
aiter_version = "unknown"
except ImportError:
aiter_version = "unknown"

@ChuanLi1101 ChuanLi1101 force-pushed the fix/mxfp4-moe-emulate-logic branch from e2e0d6f to daa055f Compare March 8, 2026 23:27
@ChuanLi1101
Copy link
Contributor Author

Also cc @ganyi1996ppo @wuhuikx for review.

Copy link
Contributor

@fxmarty-amd fxmarty-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think the test might be slightly overkill

Comment on lines +740 to +741
and self.ocp_mx_scheme.startswith("w_mxfp4")
and self.use_rocm_aiter_moe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and self.ocp_mx_scheme.startswith("w_mxfp4")
and self.use_rocm_aiter_moe
and self.ocp_mx_scheme.startswith("w_mxfp4")
and self.ocp_mx_scheme.endswith("a_mxfp4")
and self.use_rocm_aiter_moe

Fixes the same bug as #35855 (comment), introduced in introduced in #29008 (https://github.com/vllm-project/vllm/pull/29008/changes#r2877732813).

Copy link
Contributor

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

)
can_use_mxfp4_backend = self.mxfp4_backend is not None

self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fyi I plan to refactor emulate into a backend after #34285 is landed, so the logic can be merged and cleaned up.

…dware

Fix a Boolean logic regression in QuarkOCP_MX_MoEMethod that prevented
fallback to emulation mode on MI350X (gfx950) and other MX-capable
hardware, causing gibberish output when AITER CK kernels are incompatible
(e.g. ROCm version mismatch).

The previous logic:
    emulate = (not supports_mx() or not scheme.startswith("w_mxfp4"))
              and (backend is None or not use_aiter_moe)

On MI350X with w_mxfp4, the first clause is (False or False) = False,
making the entire AND expression always False regardless of whether
AITER is available. This silently disabled the emulation fallback and
ignored VLLM_ROCM_USE_AITER_MOE=0.

The fix restructures the logic to be explicit:
    can_use_native_ck = supports_mx and w_mxfp4 and aiter_enabled
    can_use_backend = backend is not None
    emulate = not (can_use_native_ck or can_use_backend)

Also adds:
- AITER version logging for easier debugging
- Workaround hint in the emulation warning message
- Parametrized unit test covering the full dispatch matrix (14 cases)

Fixes vllm-project#36337

Made-with: Cursor
Signed-off-by: Li <chuali@amd.com>
Made-with: Cursor
@ChuanLi1101 ChuanLi1101 force-pushed the fix/mxfp4-moe-emulate-logic branch from daa055f to fcf215e Compare March 9, 2026 23:16

from aiter.utility.fp4_utils import e8m0_shuffle

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this under https://github.com/vllm-project/vllm/blob/main/vllm/_aiter_ops.py maybe? pulling the version seems like something we probably want to have more generally available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working rocm Related to AMD ROCm

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

[Bug]: Kimi-K2.5-MXFP4 produces gibberish output on MI350X (gfx950) with ROCm 7.2

4 participants