Skip to content

Release 2.6 fix: cuda graph dropout accuracy #463

Merged
Micky774 merged 4 commits intorelease_v2.6_rocmfrom
zain/cuda-graph-dropout-hotfix
Mar 2, 2026
Merged

Release 2.6 fix: cuda graph dropout accuracy #463
Micky774 merged 4 commits intorelease_v2.6_rocmfrom
zain/cuda-graph-dropout-hotfix

Conversation

@Micky774
Copy link
Contributor

@Micky774 Micky774 commented Feb 25, 2026

Description

Fixes noted QA regression in test_numerics::test_gpt_cuda_graph failure when using dropout by cherry-picking an AITER sub-commit update. Consequently, due to the cherry-pick, this also enables native padding kernels for the AITER backend.

Fixes: #https://github.com/ROCm/frameworks-internal/issues/15639 https://ontrack-internal.amd.com/browse/SWDEV-580545

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Updates AITER subcommit
  • Includes infrastructure for native-padding kernel support

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Micky774 and others added 2 commits February 25, 2026 14:14
…version (#354)

* [ROCm] manually pick up fwd native padding support from Meekail's PR

* Initial update

* Updated stride

* Corrected typing in allocation portions

* Applied Ye's patch

* [ROCm] manually pick Meekail's PR to support native padding for bwd

* [ROCm] jax use runtime segment

* [ROCm] get runtime max_seqlen as well

* [ROCm] support v2 bwd native padding

* Updated conversion to include bwd pass

* Added BWD BSHD-->THD conversion and minor logic refactor

* Corrected softmax lse bug

* Updated logic flow and re-caclulation

* [ROCm] manually pick Meekail's PR to support native padding for bwd

[ROCm] support v2 bwd native padding

* Added env var guard

* Updated ptr variables and streamlined dispatch

* Added env guard

* Corrected bshd_to_thd conversion arguments

* Corrected logical flow

* Guarded memset and corrected allocation

* Remove V3 API check and guard memsets

* PR comments

* Updated documentation

* PR review reconciliation

- Updated debug message for BSHD-->THD conversion
- Added env variable to gate FWD output memset for padding
- Removed guards on memsets for d{Q,K,V} matrices

* Added explicit test

* Formatting for bwd debug

* Resolved error when using mixed formats e.g. sbhd_2bshd

* Updated guard on flash-attention forced support

* Added check for SBHD_2BSHD

* Added guard on dk/dv memset

* Removed env var gating for dk/dv zero padding, formatting

* Added inline comment to test

* Corrected Softmax LSE buffer allocation

* Correct Softmax LSE buffer memory allocation

* Adjusted fwd pass softmax lse allocation

* Adjusted bwd pass softmax conversion allocation

* Minor reversions

* [ROCm] fix the aiter fwd v3 cu_seqlen/cu_seqlen_padded api issue

* Update README.rst to fix formatting

* [ROCm] update aiter commit with swa fix

---------

Co-authored-by: Ye Wang <yewang12@amd.com>
@Micky774 Micky774 changed the title Cuda Graph Dropout Accuracy Hotfix Release 2.6 hotfix: cuda graph dropout accuracy Feb 25, 2026
@Micky774 Micky774 changed the title Release 2.6 hotfix: cuda graph dropout accuracy Release 2.6 fix: cuda graph dropout accuracy Feb 25, 2026
@ipanfilo
Copy link
Collaborator

What is justification of pushing new feature to 2.6?

@Micky774
Copy link
Contributor Author

Micky774 commented Feb 26, 2026

What is justification of pushing new feature to 2.6?

https://ontrack-internal.amd.com/browse/SWDEV-580545

Edit: to clarify, it's not about introducing the native padding, but rather introducing a fix that got incorporated in that same commit (corresponding to an AITER update). The feature comes "for free" with it in a sense.

@ipanfilo
Copy link
Collaborator

Since it is AITER update, run level 3 tests

@Micky774
Copy link
Contributor Author

Since it is AITER update, run level 3 tests

Sure, here's the link to the job for ease of access: https://github.com/ROCm/TransformerEngine/actions/runs/22460578919

@ipanfilo
Copy link
Collaborator

Please put ticket number to the PR description

Copy link
Collaborator

@wangye805 wangye805 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

BTW, please fill in the questions in the PR description

@Micky774 Micky774 merged commit 87a8fc8 into release_v2.6_rocm Mar 2, 2026
4 checks passed
@Micky774 Micky774 deleted the zain/cuda-graph-dropout-hotfix branch March 2, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants