Release 2.6 fix: cuda graph dropout accuracy by Micky774 · Pull Request #463 · ROCm/TransformerEngine

Micky774 · 2026-02-25T21:16:52Z

Description

Fixes noted QA regression in test_numerics::test_gpt_cuda_graph failure when using dropout by cherry-picking an AITER sub-commit update. Consequently, due to the cherry-pick, this also enables native padding kernels for the AITER backend.

Fixes: #https://github.com/ROCm/frameworks-internal/issues/15639 https://ontrack-internal.amd.com/browse/SWDEV-580545

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Updates AITER subcommit
Includes infrastructure for native-padding kernel support

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…version (#354) * [ROCm] manually pick up fwd native padding support from Meekail's PR * Initial update * Updated stride * Corrected typing in allocation portions * Applied Ye's patch * [ROCm] manually pick Meekail's PR to support native padding for bwd * [ROCm] jax use runtime segment * [ROCm] get runtime max_seqlen as well * [ROCm] support v2 bwd native padding * Updated conversion to include bwd pass * Added BWD BSHD-->THD conversion and minor logic refactor * Corrected softmax lse bug * Updated logic flow and re-caclulation * [ROCm] manually pick Meekail's PR to support native padding for bwd [ROCm] support v2 bwd native padding * Added env var guard * Updated ptr variables and streamlined dispatch * Added env guard * Corrected bshd_to_thd conversion arguments * Corrected logical flow * Guarded memset and corrected allocation * Remove V3 API check and guard memsets * PR comments * Updated documentation * PR review reconciliation - Updated debug message for BSHD-->THD conversion - Added env variable to gate FWD output memset for padding - Removed guards on memsets for d{Q,K,V} matrices * Added explicit test * Formatting for bwd debug * Resolved error when using mixed formats e.g. sbhd_2bshd * Updated guard on flash-attention forced support * Added check for SBHD_2BSHD * Added guard on dk/dv memset * Removed env var gating for dk/dv zero padding, formatting * Added inline comment to test * Corrected Softmax LSE buffer allocation * Correct Softmax LSE buffer memory allocation * Adjusted fwd pass softmax lse allocation * Adjusted bwd pass softmax conversion allocation * Minor reversions * [ROCm] fix the aiter fwd v3 cu_seqlen/cu_seqlen_padded api issue * Update README.rst to fix formatting * [ROCm] update aiter commit with swa fix --------- Co-authored-by: Ye Wang <yewang12@amd.com>

ipanfilo · 2026-02-26T05:05:16Z

What is justification of pushing new feature to 2.6?

Micky774 · 2026-02-26T15:15:08Z

What is justification of pushing new feature to 2.6?

https://ontrack-internal.amd.com/browse/SWDEV-580545

Edit: to clarify, it's not about introducing the native padding, but rather introducing a fix that got incorporated in that same commit (corresponding to an AITER update). The feature comes "for free" with it in a sense.

ipanfilo · 2026-02-26T20:25:06Z

Since it is AITER update, run level 3 tests

Micky774 · 2026-02-26T20:48:40Z

Since it is AITER update, run level 3 tests

Sure, here's the link to the job for ease of access: https://github.com/ROCm/TransformerEngine/actions/runs/22460578919

ipanfilo · 2026-02-27T02:22:44Z

Please put ticket number to the PR description

wangye805

LGTM

BTW, please fill in the questions in the PR description

Micky774 and others added 2 commits February 25, 2026 14:14

Updated aiter (dropped from cherry-pick)

42f407e

Micky774 requested review from ipanfilo, wangye805 and wenchenvincent as code owners February 25, 2026 21:16

Micky774 changed the title ~~Cuda Graph Dropout Accuracy Hotfix~~ Release 2.6 hotfix: cuda graph dropout accuracy Feb 25, 2026

Micky774 changed the title ~~Release 2.6 hotfix: cuda graph dropout accuracy~~ Release 2.6 fix: cuda graph dropout accuracy Feb 25, 2026

Micky774 added 2 commits February 26, 2026 12:21

Update to include 7.2 fix in AITER/CK

e00b11e

Update test

3dd7330

wangye805 approved these changes Feb 27, 2026

View reviewed changes

Micky774 merged commit 87a8fc8 into release_v2.6_rocm Mar 2, 2026
4 checks passed

Micky774 deleted the zain/cuda-graph-dropout-hotfix branch March 2, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2.6 fix: cuda graph dropout accuracy #463

Release 2.6 fix: cuda graph dropout accuracy #463
Micky774 merged 4 commits intorelease_v2.6_rocmfrom
zain/cuda-graph-dropout-hotfix

Micky774 commented Feb 25, 2026 •

edited by ipanfilo

Loading

Uh oh!

ipanfilo commented Feb 26, 2026

Uh oh!

Micky774 commented Feb 26, 2026 •

edited

Loading

Uh oh!

ipanfilo commented Feb 26, 2026

Uh oh!

Micky774 commented Feb 26, 2026

Uh oh!

ipanfilo commented Feb 27, 2026

Uh oh!

wangye805 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Micky774 commented Feb 25, 2026 • edited by ipanfilo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

ipanfilo commented Feb 26, 2026

Uh oh!

Micky774 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ipanfilo commented Feb 26, 2026

Uh oh!

Micky774 commented Feb 26, 2026

Uh oh!

ipanfilo commented Feb 27, 2026

Uh oh!

wangye805 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Micky774 commented Feb 25, 2026 •

edited by ipanfilo

Loading

Micky774 commented Feb 26, 2026 •

edited

Loading