[Bugfix] Fix issues in quark emulative logic by wangjiaxin99 · Pull Request #36486 · vllm-project/vllm

wangjiaxin99 · 2026-03-09T11:23:14Z

Issue

When running the model with AITER disabled, the emulation path does not take effect.
This issue was introduced by PR: #29008.

Specifically, the self.emulate logic in that PR incorrectly evaluates to False even when AITER is disabled via environment variables, causing the code to use the AITER execution path instead of the expected emulation path.

Fix

The fix modifies the boolean logic for self.emulate to:

not (current_platform.supports_mx() 
     and self.ocp_mx_scheme.startswith("w_mxfp4")
     and self.mxfp4_backend is not None
     and self.use_rocm_aiter_moe)

With this change:

The emulation path is used whenever any of these conditions is not satisfied.
The logic is simpler and more readable.
AITER-related environment variables are respected as intended.

Verification

Test Setup

Tested using the Kimi-K2.5-MXFP4 model:

vllm serve /shareddata/amd/Kimi-K2.5-MXFP4 -tp 8 \
  --mm-encoder-tp-mode data \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k2 \
  --trust-remote-code

Observed Log Output

After the fix, the emulation path is correctly used:

(Worker pid=134790) (Worker_TP6 pid=134790) WARNING 03-09 10:41:12 [quark_moe.py:746] The current mode (supports_mx=True, use_mxfp4_aiter_moe=False, ocp_mx_scheme=OCP_MX_Scheme.w_mxfp4_a_mxfp4) does not support native MXFP4/MXFP6 computation. Simulated weight dequantization and activation QDQ (quantize and dequantize) will be used, with the linear layers computed in high precision.

Functional Test

Using the following curl request:

curl http://localhost:8000/v1/completions \
-H 'Content-Type: application/json' \
-d '{"model": "amd/Kimi-K2.5-MXFP4", "prompt": "The capital of France is", "max_tokens": 256}'

We get the expected text completion result:

{
  "id": "cmpl-a4a3d85eb2aa5781",
  "object": "text_completion",
  "created": 1773052587,
  "model": "/shareddata/amd/Kimi-K2.5-MXFP4",
  "choices": [
    {
      "index": 0,
      "text": " renowned for its dazzling architecture, remarkable culture, and classic landmarks. But did you know the City of Light is also a great place for thrifting? Whether you're looking for vintage clothing, antiques, or a unique gift, Paris has got you covered. Here are the best thrift stores in Paris. ## Vintage and Secondhand Clothing Each one of these stores has its own style and specialty. Whatever your taste, there is a secondhand shop that will call you back time and again. 860 Paris Vintage has two locations in the center of the city and a focus on men's American-style vintage. They're known for their jackets, especially the laid-back ones of the 1970s. Like many of the shops in Paris, no.860 also has an online presence with an Instagram page that updates potential in-person customers about their latest finds. There is always something new waiting for you. Address: 115 Rue Tiquetonne, 75002 Paris, France When you think 70s and 80s fashion in Paris, look no further than Episode. This Dutch chain set up its Paris headquarters in the Latin Quarter. The glitzy designs on display there are just the right touch to bridge the past and present. Address: 29-31 Rue Tiquetonne",
      "finish_reason": "length"
    }
  ]
}

Summary

Fixed self.emulate logic to respect AITER environment variables.
Simplified boolean condition for clarity.
Verified using Kimi-K2.5-MXFP4 with logs and curl response.

github-actions · 2026-03-09T11:23:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2026-03-09T11:27:54Z

Hi @wangjiaxin99, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

This pull request fixes a bug in the logic that determines whether to use the emulated path for Quark MoE. The original boolean expression was incorrect, causing the AITER path to be used unintentionally. The new logic correctly ensures that emulation is used as a fallback unless all conditions for the native AITER path are met. The change is correct and improves the clarity of the code.

Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>

Signed-off-by: <> Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>

BowenBao · 2026-03-09T19:12:38Z

dup of #36422

ChuanLi1101 · 2026-03-09T23:13:40Z

@wangjiaxin99 thanks for looking into this — we hit the same issue and have a fix in #36422 (which @BowenBao noted above).

One concern with the logic in this PR: the current expression combines the CK native path and the Triton mxfp4_backend path into a single AND clause:

self.emulate = not (
    supports_mx() and startswith("w_mxfp4")
    and self.mxfp4_backend is not None
    and self.use_rocm_aiter_moe
)

This requires all four conditions to be true for non-emulation, but there are actually two independent paths that should avoid emulation:

CK native path: supports_mx + w_mxfp4 + use_rocm_aiter_moe (does NOT require mxfp4_backend)
Triton mxfp4 backend path: mxfp4_backend is not None (does NOT require supports_mx or aiter_moe)

Concrete scenario that breaks: if hardware doesn't support MX (supports_mx=False) but a Triton mxfp4 backend is available, this PR would incorrectly force emulation instead of using the backend.

The fix should be:

can_use_native_ck = (
    current_platform.supports_mx()
    and self.ocp_mx_scheme is not None
    and self.ocp_mx_scheme.startswith("w_mxfp4")
    and self.use_rocm_aiter_moe
)
can_use_mxfp4_backend = self.mxfp4_backend is not None

self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)

Also worth adding a unit test for these combinations — we have one in #36422 (test_quark_moe_emulate.py) that covers all platform × scheme × env-var cases, including the exact regression from #36337.

wangjiaxin99 · 2026-03-10T07:10:28Z

@wangjiaxin99 thanks for looking into this — we hit the same issue and have a fix in #36422 (which @BowenBao noted above).

One concern with the logic in this PR: the current expression combines the CK native path and the Triton mxfp4_backend path into a single AND clause:
self.emulate = not (
    supports_mx() and startswith("w_mxfp4")
    and self.mxfp4_backend is not None
    and self.use_rocm_aiter_moe
)
This requires all four conditions to be true for non-emulation, but there are actually two independent paths that should avoid emulation:

CK native path: supports_mx + w_mxfp4 + use_rocm_aiter_moe (does NOT require mxfp4_backend)

Triton mxfp4 backend path: mxfp4_backend is not None (does NOT require supports_mx or aiter_moe)

Concrete scenario that breaks: if hardware doesn't support MX (supports_mx=False) but a Triton mxfp4 backend is available, this PR would incorrectly force emulation instead of using the backend.

The fix should be:
can_use_native_ck = (
    current_platform.supports_mx()
    and self.ocp_mx_scheme is not None
    and self.ocp_mx_scheme.startswith("w_mxfp4")
    and self.use_rocm_aiter_moe
)
can_use_mxfp4_backend = self.mxfp4_backend is not None

self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)
Also worth adding a unit test for these combinations — we have one in #36422 (test_quark_moe_emulate.py) that covers all platform × scheme × env-var cases, including the exact regression from #36337.

Thanks for the detailed explanation! Since #36422 already covers this with a more robust logic and includes comprehensive unit tests, I’ll close this PR to avoid duplication. Thanks for catching this!

wangjiaxin99 · 2026-03-10T07:12:41Z

dup of #36422

Got it, thanks!

wangjiaxin99 requested a review from tjtanaa as a code owner March 9, 2026 11:23

mergify bot added the bug Something isn't working label Mar 9, 2026

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

root added 2 commits March 9, 2026 11:43

Correcting the emulate logic

7afc862

Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>

Fix pre-commit issues

9f9fe9d

Signed-off-by: <> Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>

wangjiaxin99 force-pushed the jiaxwang/correct_emulate_logic branch from 3111156 to 9f9fe9d Compare March 9, 2026 11:43

Merge branch 'main' into jiaxwang/correct_emulate_logic

2425b9f

wangjiaxin99 closed this Mar 10, 2026

wangjiaxin99 deleted the jiaxwang/correct_emulate_logic branch March 10, 2026 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix issues in quark emulative logic#36486

[Bugfix] Fix issues in quark emulative logic#36486
wangjiaxin99 wants to merge 3 commits intovllm-project:mainfrom
wangjiaxin99:jiaxwang/correct_emulate_logic

wangjiaxin99 commented Mar 9, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

BowenBao commented Mar 9, 2026

Uh oh!

ChuanLi1101 commented Mar 9, 2026

Uh oh!

wangjiaxin99 commented Mar 10, 2026

Uh oh!

wangjiaxin99 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

wangjiaxin99 commented Mar 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Fix

Verification

Test Setup

Observed Log Output

Functional Test

Summary

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

BowenBao commented Mar 9, 2026

Uh oh!

ChuanLi1101 commented Mar 9, 2026

Uh oh!

wangjiaxin99 commented Mar 10, 2026

Uh oh!

wangjiaxin99 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wangjiaxin99 commented Mar 9, 2026 •

edited by github-actions bot

Loading