Fix workgroup reordering for large shapes by xintin · Pull Request #1063 · iree-org/wave

xintin · 2026-03-06T08:37:55Z

In read_write.py, the split_index function splits an index expression into workgroup-independent (wg) and thread-dependent (th) parts for linearize_memref. The wg part becomes the reinterpret_cast base offset, while the th part becomes the store index.

When reorder_workgroups=True with dynamic dimensions, the output index contains Piecewise(tail_new_wg0, main_new_wg0) expressions.
After substituting WORKGROUP_0/1=0, the difference diff = src - thread_dependent_index still contains dynamic symbols (M, N, K).

The guard detects these residual symbols and falls back:

thread_independent_index = sympy.sympify(0)   # wg offset = 0
thread_dependent_index = src                  # full address as thread index

This makes the full linear address (M_row * N + N_col) the store index, which overflows 32-bit when the output buffer exceeds ~4 GB in bytes (M*N*4 > 2**32).

In the reorder_workgroups=False case, diff = block_id_x * 64 has no residual symbols (block_id_x is a wg symbol), so the split works correctly, embedding the large wg offset into the reinterpret_cast base and keeping only a small thread offset.

Signed-off-by: xintin <gaurav.verma@amd.com>

harsh-nod · 2026-03-06T16:20:22Z

Could you add some lit test for this that shows the IR?

Signed-off-by: xintin <gaurav.verma@amd.com>

This builds on PRs #1061, #1063, and #1067 to get the block size 256x224x256 working for the list of shapes we were looking at today. Signed-off-by: William G Hatch <william@hatch.uno>

xintin added 4 commits March 6, 2026 05:18

fix more dyn shapes

7236862

Signed-off-by: xintin <gaurav.verma@amd.com>

fix lit tests: buffer size

b26d3b3

Signed-off-by: xintin <gaurav.verma@amd.com>

rebase

7660835

Signed-off-by: xintin <gaurav.verma@amd.com>

fix read write

d2dcc38

Signed-off-by: xintin <gaurav.verma@amd.com>

xintin requested review from Hardcode84 and harsh-nod March 6, 2026 08:40

added lit test

0bd999d

Signed-off-by: xintin <gaurav.verma@amd.com>

harsh-nod approved these changes Mar 6, 2026

View reviewed changes

xintin changed the base branch from xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start to main March 6, 2026 17:33

xintin force-pushed the xintin/fix_wg_reordering_for_large_shapes branch from 4047564 to 0bd999d Compare March 6, 2026 17:37

final commit

1705b86

Signed-off-by: xintin <gaurav.verma@amd.com>

xintin force-pushed the xintin/fix_wg_reordering_for_large_shapes branch from 06a9196 to 1705b86 Compare March 6, 2026 17:49

Merge branch 'main' into xintin/fix_wg_reordering_for_large_shapes

5e4d978

xintin merged commit 9177fe2 into main Mar 6, 2026
17 checks passed

xintin deleted the xintin/fix_wg_reordering_for_large_shapes branch March 6, 2026 18:49

willghatch mentioned this pull request Mar 6, 2026

Fix bounds expressions to respect workgroup reordering #1072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix workgroup reordering for large shapes#1063

Fix workgroup reordering for large shapes#1063
xintin merged 7 commits intomainfrom
xintin/fix_wg_reordering_for_large_shapes

xintin commented Mar 6, 2026

Uh oh!

harsh-nod commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xintin commented Mar 6, 2026

Uh oh!

harsh-nod commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants