Skip to content

[Bugfix] Fix issues in quark emulative logic#36486

Closed
wangjiaxin99 wants to merge 3 commits intovllm-project:mainfrom
wangjiaxin99:jiaxwang/correct_emulate_logic
Closed

[Bugfix] Fix issues in quark emulative logic#36486
wangjiaxin99 wants to merge 3 commits intovllm-project:mainfrom
wangjiaxin99:jiaxwang/correct_emulate_logic

Conversation

@wangjiaxin99
Copy link

@wangjiaxin99 wangjiaxin99 commented Mar 9, 2026

Issue

When running the model with AITER disabled, the emulation path does not take effect.
This issue was introduced by PR: #29008.

Specifically, the self.emulate logic in that PR incorrectly evaluates to False even when AITER is disabled via environment variables, causing the code to use the AITER execution path instead of the expected emulation path.


Fix

The fix modifies the boolean logic for self.emulate to:

not (current_platform.supports_mx() 
     and self.ocp_mx_scheme.startswith("w_mxfp4")
     and self.mxfp4_backend is not None
     and self.use_rocm_aiter_moe)

With this change:

  • The emulation path is used whenever any of these conditions is not satisfied.
  • The logic is simpler and more readable.
  • AITER-related environment variables are respected as intended.

Verification

Test Setup

Tested using the Kimi-K2.5-MXFP4 model:

vllm serve /shareddata/amd/Kimi-K2.5-MXFP4 -tp 8 \
  --mm-encoder-tp-mode data \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k2 \
  --trust-remote-code

Observed Log Output

After the fix, the emulation path is correctly used:

(Worker pid=134790) (Worker_TP6 pid=134790) WARNING 03-09 10:41:12 [quark_moe.py:746] The current mode (supports_mx=True, use_mxfp4_aiter_moe=False, ocp_mx_scheme=OCP_MX_Scheme.w_mxfp4_a_mxfp4) does not support native MXFP4/MXFP6 computation. Simulated weight dequantization and activation QDQ (quantize and dequantize) will be used, with the linear layers computed in high precision.

Functional Test

Using the following curl request:

curl http://localhost:8000/v1/completions \
-H 'Content-Type: application/json' \
-d '{"model": "amd/Kimi-K2.5-MXFP4", "prompt": "The capital of France is", "max_tokens": 256}'

We get the expected text completion result:

{
  "id": "cmpl-a4a3d85eb2aa5781",
  "object": "text_completion",
  "created": 1773052587,
  "model": "/shareddata/amd/Kimi-K2.5-MXFP4",
  "choices": [
    {
      "index": 0,
      "text": " renowned for its dazzling architecture, remarkable culture, and classic landmarks. But did you know the City of Light is also a great place for thrifting? Whether you're looking for vintage clothing, antiques, or a unique gift, Paris has got you covered. Here are the best thrift stores in Paris. ## Vintage and Secondhand Clothing Each one of these stores has its own style and specialty. Whatever your taste, there is a secondhand shop that will call you back time and again. 860 Paris Vintage has two locations in the center of the city and a focus on men's American-style vintage. They're known for their jackets, especially the laid-back ones of the 1970s. Like many of the shops in Paris, no.860 also has an online presence with an Instagram page that updates potential in-person customers about their latest finds. There is always something new waiting for you. Address: 115 Rue Tiquetonne, 75002 Paris, France When you think 70s and 80s fashion in Paris, look no further than Episode. This Dutch chain set up its Paris headquarters in the Latin Quarter. The glitzy designs on display there are just the right touch to bridge the past and present. Address: 29-31 Rue Tiquetonne",
      "finish_reason": "length"
    }
  ]
}

Summary

  • Fixed self.emulate logic to respect AITER environment variables.
  • Simplified boolean condition for clarity.
  • Verified using Kimi-K2.5-MXFP4 with logs and curl response.

@wangjiaxin99 wangjiaxin99 requested a review from tjtanaa as a code owner March 9, 2026 11:23
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the bug Something isn't working label Mar 9, 2026
@mergify
Copy link

mergify bot commented Mar 9, 2026

Hi @wangjiaxin99, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in the logic that determines whether to use the emulated path for Quark MoE. The original boolean expression was incorrect, causing the AITER path to be used unintentionally. The new logic correctly ensures that emulation is used as a fallback unless all conditions for the native AITER path are met. The change is correct and improves the clarity of the code.

root added 2 commits March 9, 2026 11:43
Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>
Signed-off-by:  <>
Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>
@wangjiaxin99 wangjiaxin99 force-pushed the jiaxwang/correct_emulate_logic branch from 3111156 to 9f9fe9d Compare March 9, 2026 11:43
@BowenBao
Copy link
Contributor

BowenBao commented Mar 9, 2026

dup of #36422

@ChuanLi1101
Copy link
Contributor

@wangjiaxin99 thanks for looking into this — we hit the same issue and have a fix in #36422 (which @BowenBao noted above).

One concern with the logic in this PR: the current expression combines the CK native path and the Triton mxfp4_backend path into a single AND clause:

self.emulate = not (
    supports_mx() and startswith("w_mxfp4")
    and self.mxfp4_backend is not None
    and self.use_rocm_aiter_moe
)

This requires all four conditions to be true for non-emulation, but there are actually two independent paths that should avoid emulation:

  1. CK native path: supports_mx + w_mxfp4 + use_rocm_aiter_moe (does NOT require mxfp4_backend)
  2. Triton mxfp4 backend path: mxfp4_backend is not None (does NOT require supports_mx or aiter_moe)

Concrete scenario that breaks: if hardware doesn't support MX (supports_mx=False) but a Triton mxfp4 backend is available, this PR would incorrectly force emulation instead of using the backend.

The fix should be:

can_use_native_ck = (
    current_platform.supports_mx()
    and self.ocp_mx_scheme is not None
    and self.ocp_mx_scheme.startswith("w_mxfp4")
    and self.use_rocm_aiter_moe
)
can_use_mxfp4_backend = self.mxfp4_backend is not None

self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)

Also worth adding a unit test for these combinations — we have one in #36422 (test_quark_moe_emulate.py) that covers all platform × scheme × env-var cases, including the exact regression from #36337.

@wangjiaxin99
Copy link
Author

@wangjiaxin99 thanks for looking into this — we hit the same issue and have a fix in #36422 (which @BowenBao noted above).

One concern with the logic in this PR: the current expression combines the CK native path and the Triton mxfp4_backend path into a single AND clause:

self.emulate = not (
    supports_mx() and startswith("w_mxfp4")
    and self.mxfp4_backend is not None
    and self.use_rocm_aiter_moe
)

This requires all four conditions to be true for non-emulation, but there are actually two independent paths that should avoid emulation:

  1. CK native path: supports_mx + w_mxfp4 + use_rocm_aiter_moe (does NOT require mxfp4_backend)
  2. Triton mxfp4 backend path: mxfp4_backend is not None (does NOT require supports_mx or aiter_moe)

Concrete scenario that breaks: if hardware doesn't support MX (supports_mx=False) but a Triton mxfp4 backend is available, this PR would incorrectly force emulation instead of using the backend.

The fix should be:

can_use_native_ck = (
    current_platform.supports_mx()
    and self.ocp_mx_scheme is not None
    and self.ocp_mx_scheme.startswith("w_mxfp4")
    and self.use_rocm_aiter_moe
)
can_use_mxfp4_backend = self.mxfp4_backend is not None

self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)

Also worth adding a unit test for these combinations — we have one in #36422 (test_quark_moe_emulate.py) that covers all platform × scheme × env-var cases, including the exact regression from #36337.

Thanks for the detailed explanation! Since #36422 already covers this with a more robust logic and includes comprehensive unit tests, I’ll close this PR to avoid duplication. Thanks for catching this!

@wangjiaxin99 wangjiaxin99 deleted the jiaxwang/correct_emulate_logic branch March 10, 2026 07:11
@wangjiaxin99
Copy link
Author

dup of #36422

Got it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants