Release 2.6 fix: cuda graph dropout accuracy #463
Merged
Micky774 merged 4 commits intorelease_v2.6_rocmfrom Mar 2, 2026
Merged
Release 2.6 fix: cuda graph dropout accuracy #463Micky774 merged 4 commits intorelease_v2.6_rocmfrom
Micky774 merged 4 commits intorelease_v2.6_rocmfrom
Conversation
…version (#354) * [ROCm] manually pick up fwd native padding support from Meekail's PR * Initial update * Updated stride * Corrected typing in allocation portions * Applied Ye's patch * [ROCm] manually pick Meekail's PR to support native padding for bwd * [ROCm] jax use runtime segment * [ROCm] get runtime max_seqlen as well * [ROCm] support v2 bwd native padding * Updated conversion to include bwd pass * Added BWD BSHD-->THD conversion and minor logic refactor * Corrected softmax lse bug * Updated logic flow and re-caclulation * [ROCm] manually pick Meekail's PR to support native padding for bwd [ROCm] support v2 bwd native padding * Added env var guard * Updated ptr variables and streamlined dispatch * Added env guard * Corrected bshd_to_thd conversion arguments * Corrected logical flow * Guarded memset and corrected allocation * Remove V3 API check and guard memsets * PR comments * Updated documentation * PR review reconciliation - Updated debug message for BSHD-->THD conversion - Added env variable to gate FWD output memset for padding - Removed guards on memsets for d{Q,K,V} matrices * Added explicit test * Formatting for bwd debug * Resolved error when using mixed formats e.g. sbhd_2bshd * Updated guard on flash-attention forced support * Added check for SBHD_2BSHD * Added guard on dk/dv memset * Removed env var gating for dk/dv zero padding, formatting * Added inline comment to test * Corrected Softmax LSE buffer allocation * Correct Softmax LSE buffer memory allocation * Adjusted fwd pass softmax lse allocation * Adjusted bwd pass softmax conversion allocation * Minor reversions * [ROCm] fix the aiter fwd v3 cu_seqlen/cu_seqlen_padded api issue * Update README.rst to fix formatting * [ROCm] update aiter commit with swa fix --------- Co-authored-by: Ye Wang <yewang12@amd.com>
Collaborator
|
What is justification of pushing new feature to 2.6? |
Contributor
Author
https://ontrack-internal.amd.com/browse/SWDEV-580545 Edit: to clarify, it's not about introducing the native padding, but rather introducing a fix that got incorporated in that same commit (corresponding to an AITER update). The feature comes "for free" with it in a sense. |
Collaborator
|
Since it is AITER update, run level 3 tests |
Contributor
Author
Sure, here's the link to the job for ease of access: https://github.com/ROCm/TransformerEngine/actions/runs/22460578919 |
Collaborator
|
Please put ticket number to the PR description |
wangye805
approved these changes
Feb 27, 2026
Collaborator
wangye805
left a comment
There was a problem hiding this comment.
LGTM
BTW, please fill in the questions in the PR description
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes noted QA regression in
test_numerics::test_gpt_cuda_graphfailure when using dropout by cherry-picking an AITER sub-commit update. Consequently, due to the cherry-pick, this also enables native padding kernels for the AITER backend.Fixes: #https://github.com/ROCm/frameworks-internal/issues/15639 https://ontrack-internal.amd.com/browse/SWDEV-580545
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: