[Navi] Add support with Infinity Cache (LLC) awareness for improved performance#169
Closed
[Navi] Add support with Infinity Cache (LLC) awareness for improved performance#169
Conversation
Signed-off-by: loscrossos <165311345+loscrossos@users.noreply.github.com>
- Implement block-sparse attention in flash_fwd_sm100.py - Update interface.py to handle SM100 block size calculations (2x multiplier for m_block_size since 1 CTA handles 2*tile_m rows) - Add mask_mod parameter support in mask.py for block-sparse masking - Add SM100 test fixtures and tile size handling in test_mask_mod.py This enables block-sparsity on SM 10.0 architecture, including mask_mod support and proper block size accounting.
…-AILab#2014) * use correction warps for epi when varlen (non tma O) * properly enable fallback epilogue for varlen q * fix rebase errors * update tests
* add fastdivmod for oob reads in mask_mods * Updates for h100
… conditions (Dao-AILab#2033) * enable deterministic mode for sm100 bwd and fix race conditions * turn off lpt scheduler for causal * use more regs for reduce when deterministic * make a src for tiled mma dK toggleable parameter, remove smem async fence for lse release * use 100k iterations for default
Not much to see here, but this causes linter noise
* Bump pin * Swtich to new fastdivmod * cleanup varlen on blackwell * Allow for only cute install
…s/fa3-compile Add torch.compile support to flash attention 3
* add local for sm100 bwd * add deterministic * update tests * ruff files * remove old code * move comment * override window_size = None for causal * revert to fwd test defaults
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR enables Flash Attention Triton support for AMD RDNA3 (Navi) GPUs, specifically targeting the gfx1100 architecture. The goal is to bring Flash Attention performance optimizations to consumer-grade AMD GPUs while leveraging the unique Infinity Cache (LLC) architecture for improved memory throughput.
Technical Details
New Architecture Support:
Performance Optimizations:
Code Cleanup:
Test Plan
Test Result
Submission Checklist