Skip to content

Fix MXFP4 8w PP kernel correctness #1059

Open
adedespirlet wants to merge 1 commit intoiree-org:mainfrom
adedespirlet:final2
Open

Fix MXFP4 8w PP kernel correctness #1059
adedespirlet wants to merge 1 commit intoiree-org:mainfrom
adedespirlet:final2

Conversation

@adedespirlet
Copy link
Contributor

@adedespirlet adedespirlet commented Mar 6, 2026

This PR:

  • adds _get_8wave_shape_from_block() which adjusts wave_shape based on tile size (if tile size ==32 make sure wave shape along that dimension is not above 2)

  • adds an extra barrier (double barrier) to the 8 wave PP kernel with B preshuffled and w/o . This makes sure we never have a memory dependency issue between the two cluster. Theoretically proven that it is not possible to have a race and thus works on all macro tile sizes.

  • adds dynamic flat support for the 8 wave PP without B preshuffling

Signed-off-by: Aurore De Spirlet <aurore.despirlet@amd.com>
@adedespirlet adedespirlet requested a review from harsh-nod March 6, 2026 02:12
@adedespirlet adedespirlet changed the title MXFP4 8w kernel Fix MXFP4 8w PP kernel correctness across different tiles Mar 6, 2026
@adedespirlet adedespirlet changed the title Fix MXFP4 8w PP kernel correctness across different tiles Fix MXFP4 8w PP kernel correctness Mar 6, 2026
@harsh-nod
Copy link
Collaborator

Can you also add it to an existing test or add a new test so this is tracked on the ci? 7.1_schedule is not run on th e ci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants