Skip to content

Conversation

@phdddd
Copy link
Contributor

@phdddd phdddd commented Dec 24, 2025

What does this PR do?

This PR is about the ascent fusion operator patch for the qwen3vl model.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include misc, ci, config, docs, data, dist, omni, logging, model, optim, ckpt, release, task, perf, ops, parallel
    • If this PR involves multiple modules, separate them with , like [ci, data, model]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][parallel, model] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@github-actions github-actions bot added the ascend everything about Ascend support label Dec 24, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces fused operators for Ascend NPUs, focusing on AdamW and RoPE for Qwen3VL models. While the RoPE implementation appears solid, the fused AdamW implementation has a couple of significant issues. Firstly, it contains hardcoded CUDA device calls, which is a critical error for NPU-targeted code and will cause it to fail. Secondly, the implementation of the fused AdamW operator uses a loop over parameters, which negates the performance benefits of fusion. I have provided specific suggestions to address these high-priority issues.

@zhihaofang1017
Copy link
Contributor

Could you provide a visual comparison of accuracy and performance?
How much performance improvement is there for each fusion operator?

@FoolPlayer FoolPlayer changed the title [Modify]Add ascend fused operators for Qwen3VL [model] feat: Add ascend fused operators for Qwen3VL Dec 30, 2025
@phdddd
Copy link
Contributor Author

phdddd commented Jan 4, 2026

code format

@Crystal-jiang Crystal-jiang merged commit e0db332 into ByteDance-Seed:main Jan 8, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ascend everything about Ascend support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants