Skip to content

Fix bounds expressions to respect workgroup reordering#1072

Merged
willghatch merged 2 commits intomainfrom
users/willghatch/bs-256x224x256
Mar 6, 2026
Merged

Fix bounds expressions to respect workgroup reordering#1072
willghatch merged 2 commits intomainfrom
users/willghatch/bs-256x224x256

Conversation

@willghatch
Copy link
Contributor

@willghatch willghatch commented Mar 6, 2026

Fix bounds expressions to respect workgroup reordering

generate_bounds_exprs was not aware of ReorderingConstraints, so when a bound expression contained a raw workgroup symbol (e.g. WORKGROUP_1) it was emitted as-is, mapping directly to block_id_y.
With workgroup reordering active, the actual tile position depends on both block_id_x and block_id_y (via the flattened/swizzled index), so the raw symbol gives an incorrect mask.
This happens when WaveConstraint.get_index_bound() fires -- i.e. when the wave tile size is not divisible by the MMA vector shape (e.g. BLOCK_N=224 with 4 waves gives wave_tile=56, and 56 % 16 != 0).
The returned bound WORKGROUP_1 * BLOCK_N + wave_id * wave_tile + wave_tile contains WORKGROUP_1.
The fix passes reordering_constraints into generate_bounds_exprs and substitutes each workgroup symbol with its reordered expression before storing bounds on the node.
This ensures the emitted affine.apply for masking references both block IDs when reordering is active.

Fixes the 256x224x256 block size for dynamic-shape GEMM. (More specifically the 7.1 dynamic preshuffle with LLVM backend.)

@willghatch
Copy link
Contributor Author

Well, the top commit that is mine is the thing that this is about. I assume that as the other PRs get merged I'll rebase this until it is actually just one commit. But for now if you want to view, use the commit view.

@willghatch willghatch force-pushed the users/willghatch/bs-256x224x256 branch 2 times, most recently from 060fb6c to 868cb1c Compare March 6, 2026 21:13
@willghatch willghatch requested review from harsh-nod and xintin March 6, 2026 21:42
`generate_bounds_exprs` was not aware of ReorderingConstraints, so when a bound expression contained a raw workgroup symbol (e.g. WORKGROUP_1) it was emitted as-is, mapping directly to block_id_y.
With workgroup reordering active, the actual tile position depends on both block_id_x and block_id_y (via the flattened/swizzled index), so the raw symbol gives an incorrect mask.
This happens when WaveConstraint.get_index_bound() fires -- i.e. when the wave tile size is not divisible by the MMA vector shape (e.g. BLOCK_N=224 with 4 waves gives wave_tile=56, and 56 % 16 != 0).
The returned bound WORKGROUP_1 * BLOCK_N + wave_id * wave_tile + wave_tile contains WORKGROUP_1.
The fix passes reordering_constraints into generate_bounds_exprs and substitutes each workgroup symbol with its reordered expression before storing bounds on the node.
This ensures the emitted affine.apply for masking references both block IDs when reordering is active.

Fixes the 256x224x256 block size for dynamic-shape GEMM.  (More specifically the 7.1 dynamic preshuffle with LLVM backend.)

Signed-off-by: William G Hatch <william@hatch.uno>
@willghatch willghatch force-pushed the users/willghatch/bs-256x224x256 branch from 868cb1c to 71ddcd1 Compare March 6, 2026 22:25
@harsh-nod harsh-nod changed the title Gets the 7.1 dynamic preshuffle working for LLVM backend for 256x224x256 Fix bounds expressions to respect workgroup reordering Mar 6, 2026
Copy link
Collaborator

@harsh-nod harsh-nod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Signed-off-by: William G Hatch <william@hatch.uno>
@willghatch willghatch force-pushed the users/willghatch/bs-256x224x256 branch from 7fc95a8 to 9c38eb0 Compare March 6, 2026 22:37
@willghatch willghatch merged commit 775987d into main Mar 6, 2026
17 checks passed
@willghatch willghatch deleted the users/willghatch/bs-256x224x256 branch March 6, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants