[Bugfix] Fix issues in quark emulative logic#36486
[Bugfix] Fix issues in quark emulative logic#36486wangjiaxin99 wants to merge 3 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Hi @wangjiaxin99, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in the logic that determines whether to use the emulated path for Quark MoE. The original boolean expression was incorrect, causing the AITER path to be used unintentionally. The new logic correctly ensures that emulation is used as a fallback unless all conditions for the native AITER path are met. The change is correct and improves the clarity of the code.
Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>
Signed-off-by: <> Signed-off-by: wangjiaxin99 <jiaxwang@amd.com>
3111156 to
9f9fe9d
Compare
|
dup of #36422 |
|
@wangjiaxin99 thanks for looking into this — we hit the same issue and have a fix in #36422 (which @BowenBao noted above). One concern with the logic in this PR: the current expression combines the CK native path and the Triton self.emulate = not (
supports_mx() and startswith("w_mxfp4")
and self.mxfp4_backend is not None
and self.use_rocm_aiter_moe
)This requires all four conditions to be true for non-emulation, but there are actually two independent paths that should avoid emulation:
Concrete scenario that breaks: if hardware doesn't support MX ( The fix should be: can_use_native_ck = (
current_platform.supports_mx()
and self.ocp_mx_scheme is not None
and self.ocp_mx_scheme.startswith("w_mxfp4")
and self.use_rocm_aiter_moe
)
can_use_mxfp4_backend = self.mxfp4_backend is not None
self.emulate = not (can_use_native_ck or can_use_mxfp4_backend)Also worth adding a unit test for these combinations — we have one in #36422 ( |
Thanks for the detailed explanation! Since #36422 already covers this with a more robust logic and includes comprehensive unit tests, I’ll close this PR to avoid duplication. Thanks for catching this! |
Got it, thanks! |
Issue
When running the model with AITER disabled, the emulation path does not take effect.
This issue was introduced by PR: #29008.
Specifically, the
self.emulatelogic in that PR incorrectly evaluates toFalseeven when AITER is disabled via environment variables, causing the code to use the AITER execution path instead of the expected emulation path.Fix
The fix modifies the boolean logic for
self.emulateto:With this change:
Verification
Test Setup
Tested using the Kimi-K2.5-MXFP4 model:
Observed Log Output
After the fix, the emulation path is correctly used:
Functional Test
Using the following curl request:
We get the expected text completion result:
{ "id": "cmpl-a4a3d85eb2aa5781", "object": "text_completion", "created": 1773052587, "model": "/shareddata/amd/Kimi-K2.5-MXFP4", "choices": [ { "index": 0, "text": " renowned for its dazzling architecture, remarkable culture, and classic landmarks. But did you know the City of Light is also a great place for thrifting? Whether you're looking for vintage clothing, antiques, or a unique gift, Paris has got you covered. Here are the best thrift stores in Paris. ## Vintage and Secondhand Clothing Each one of these stores has its own style and specialty. Whatever your taste, there is a secondhand shop that will call you back time and again. 860 Paris Vintage has two locations in the center of the city and a focus on men's American-style vintage. They're known for their jackets, especially the laid-back ones of the 1970s. Like many of the shops in Paris, no.860 also has an online presence with an Instagram page that updates potential in-person customers about their latest finds. There is always something new waiting for you. Address: 115 Rue Tiquetonne, 75002 Paris, France When you think 70s and 80s fashion in Paris, look no further than Episode. This Dutch chain set up its Paris headquarters in the Latin Quarter. The glitzy designs on display there are just the right touch to bridge the past and present. Address: 29-31 Rue Tiquetonne", "finish_reason": "length" } ] }Summary