dynamic workgroup reordering for MXFP4 by adedespirlet · Pull Request #1001 · iree-org/wave

adedespirlet · 2026-02-27T19:13:59Z

This PR adds compute_best_group_size_n(), a function that dynamically determines the optimal N-dimension workgroup reordering based on the GEMM shape. The goal of this optimization is to minimize total DRAM fetches by maximizing data reuse within a "batch" of concurrent workgroups.

How it works: Score Model

On MI350 the hardware assigns flat workgroup indices round robin to 8 XCD.
Each XCD runs 32 CUs in parallel, forming a "batch" of 32 concurrent workgroups per XCD. Each batch of in flight workgroups covers an "area" of unique A tiles (U_A) and unique B tiles (U_B).
We model the cost of DRAM traffic as:

Cost_score = U_A + U_B
Subject to the constraint: U_A x U_B = 32 (batch size per XCD)

That sum is minimized when U_A and U_B are balanced ~= 5.65
To fit integer tile dimensions, we target the closest factors of 32: 4 and 8.
So an optimal balance of (4,8) or (8,4) gives a minimal score of 12. (this means for any XCD we have : 4 WGs mapped along either M/N and 8 WGs along either N/M)

Dynamic Selection:
compute_best_group_size_n compares the default hardware dispatch score (which can be as high as 18 or 33 for imbalanced shapes) against the reordered dispatch.

If balance is already optimal (sum=12) : Reordering is skipped to avoid useless indexing logic overhead (which harms perf)
If imbalanced: Reordering is enabled to "reshape" the dispatch into the optimal 4x8 or 8x4 layout

Signed-off-by: Aurore De Spirlet <aurore.despirlet@gmail.com>

Aurore De Spirlet added 3 commits February 27, 2026 19:07

add workgroup reordering logic

27780c6

Signed-off-by: Aurore De Spirlet <aurore.despirlet@gmail.com>

cleaning

ca71ca2

Signed-off-by: Aurore De Spirlet <aurore.despirlet@gmail.com>

cleaning

f3d0a61

Signed-off-by: Aurore De Spirlet <aurore.despirlet@gmail.com>

adedespirlet changed the title ~~Workgroup reordering prediction~~ dynamic workgroup reordering for MXFP4 Feb 27, 2026

disable reordering for small gemm shaps where num_WG<XCD

b7a4ef1

Signed-off-by: Aurore De Spirlet <aurore.despirlet@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic workgroup reordering for MXFP4 #1001

dynamic workgroup reordering for MXFP4 #1001
adedespirlet wants to merge 4 commits intoiree-org:mainfrom
adedespirlet:workgroup_reordering

adedespirlet commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adedespirlet commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works: Score Model

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adedespirlet commented Feb 27, 2026 •

edited

Loading