Skip to content

Fix workgroup reordering for large shapes#1063

Merged
xintin merged 7 commits intomainfrom
xintin/fix_wg_reordering_for_large_shapes
Mar 6, 2026
Merged

Fix workgroup reordering for large shapes#1063
xintin merged 7 commits intomainfrom
xintin/fix_wg_reordering_for_large_shapes

Conversation

@xintin
Copy link
Contributor

@xintin xintin commented Mar 6, 2026

In read_write.py, the split_index function splits an index expression into workgroup-independent (wg) and thread-dependent (th) parts for linearize_memref. The wg part becomes the reinterpret_cast base offset, while the th part becomes the store index.

When reorder_workgroups=True with dynamic dimensions, the output index contains Piecewise(tail_new_wg0, main_new_wg0) expressions.
After substituting WORKGROUP_0/1=0, the difference diff = src - thread_dependent_index still contains dynamic symbols (M, N, K).

The guard detects these residual symbols and falls back:

thread_independent_index = sympy.sympify(0)   # wg offset = 0
thread_dependent_index = src                  # full address as thread index

This makes the full linear address (M_row * N + N_col) the store index, which overflows 32-bit when the output buffer exceeds ~4 GB in bytes (M*N*4 > 2**32).

In the reorder_workgroups=False case, diff = block_id_x * 64 has no residual symbols (block_id_x is a wg symbol), so the split works correctly, embedding the large wg offset into the reinterpret_cast base and keeping only a small thread offset.

xintin added 4 commits March 6, 2026 05:18
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin requested review from Hardcode84 and harsh-nod March 6, 2026 08:40
@harsh-nod
Copy link
Collaborator

Could you add some lit test for this that shows the IR?

Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin changed the base branch from xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start to main March 6, 2026 17:33
@xintin xintin force-pushed the xintin/fix_wg_reordering_for_large_shapes branch from 4047564 to 0bd999d Compare March 6, 2026 17:37
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/fix_wg_reordering_for_large_shapes branch from 06a9196 to 1705b86 Compare March 6, 2026 17:49
@xintin xintin merged commit 9177fe2 into main Mar 6, 2026
17 checks passed
@xintin xintin deleted the xintin/fix_wg_reordering_for_large_shapes branch March 6, 2026 18:49
willghatch added a commit that referenced this pull request Mar 6, 2026
This builds on PRs #1061, #1063, and #1067 to get the block size 256x224x256 working for the list of shapes we were looking at today.

Signed-off-by: William G Hatch <william@hatch.uno>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants