Conversation
Change assignment of unquantized moe weights when using aiter on rocm, making it safer for reloading the weights. This will solve the random output case after wake-up and reloading weights in reinforcement learning.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Signed-off-by: Zhaodong Bing <45478848+aaab8b@users.noreply.github.com>
Change assignment of unquantized moe weights when using aiter on rocm, making it safer for reloading the weights. This will solve the random output case after wake-up and reloading weights in reinforcement learning.
Purpose
Change assignment of unquantized moe weights when using aiter on rocm, making it safer for reloading the weights. Solve the random output case after wake-up and reloading weights in reinforcement learning.
I've been doing some adaptation works for using vllm on ROLL (https://github.com/alibaba/ROLL/) with ROCm platforms. It turns out to generate random characters after vllm instance sleeps and wake-up and get model weights from reloading from train actors. Then I found out that AITER will shuffle the original weights to a better layout to be calculated, but it points to a new tensor, which will cause problem after capturing cuda graphs and reloading weights after waking up. This pull request will fix this issue in general usage of vllm when using AITER on ROCm platforms in reinforcement learning.
Test Plan
Enabling VLLM_ROCM_USE_AITER=1 and VLLM_ROCM_USE_AITER_MOE=1 testing Qwen3-30BA3B (or a moe model) after wake-up and reloading weights in reinforcement learning.
Test Result
If using previous assignment, the output will be random characters. If using my assignment, the output will be normal.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.