Fix bounds expressions to respect workgroup reordering#1072
Merged
willghatch merged 2 commits intomainfrom Mar 6, 2026
Merged
Conversation
Contributor
Author
|
Well, the top commit that is mine is the thing that this is about. I assume that as the other PRs get merged I'll rebase this until it is actually just one commit. But for now if you want to view, use the commit view. |
060fb6c to
868cb1c
Compare
`generate_bounds_exprs` was not aware of ReorderingConstraints, so when a bound expression contained a raw workgroup symbol (e.g. WORKGROUP_1) it was emitted as-is, mapping directly to block_id_y. With workgroup reordering active, the actual tile position depends on both block_id_x and block_id_y (via the flattened/swizzled index), so the raw symbol gives an incorrect mask. This happens when WaveConstraint.get_index_bound() fires -- i.e. when the wave tile size is not divisible by the MMA vector shape (e.g. BLOCK_N=224 with 4 waves gives wave_tile=56, and 56 % 16 != 0). The returned bound WORKGROUP_1 * BLOCK_N + wave_id * wave_tile + wave_tile contains WORKGROUP_1. The fix passes reordering_constraints into generate_bounds_exprs and substitutes each workgroup symbol with its reordered expression before storing bounds on the node. This ensures the emitted affine.apply for masking references both block IDs when reordering is active. Fixes the 256x224x256 block size for dynamic-shape GEMM. (More specifically the 7.1 dynamic preshuffle with LLVM backend.) Signed-off-by: William G Hatch <william@hatch.uno>
868cb1c to
71ddcd1
Compare
xintin
reviewed
Mar 6, 2026
Signed-off-by: William G Hatch <william@hatch.uno>
7fc95a8 to
9c38eb0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix bounds expressions to respect workgroup reordering
generate_bounds_exprswas not aware of ReorderingConstraints, so when a bound expression contained a raw workgroup symbol (e.g. WORKGROUP_1) it was emitted as-is, mapping directly to block_id_y.With workgroup reordering active, the actual tile position depends on both block_id_x and block_id_y (via the flattened/swizzled index), so the raw symbol gives an incorrect mask.
This happens when WaveConstraint.get_index_bound() fires -- i.e. when the wave tile size is not divisible by the MMA vector shape (e.g. BLOCK_N=224 with 4 waves gives wave_tile=56, and 56 % 16 != 0).
The returned bound WORKGROUP_1 * BLOCK_N + wave_id * wave_tile + wave_tile contains WORKGROUP_1.
The fix passes reordering_constraints into generate_bounds_exprs and substitutes each workgroup symbol with its reordered expression before storing bounds on the node.
This ensures the emitted affine.apply for masking references both block IDs when reordering is active.
Fixes the 256x224x256 block size for dynamic-shape GEMM. (More specifically the 7.1 dynamic preshuffle with LLVM backend.)