Skip to content

Zero in-loop VALU for MXFP4 preshuffle GEMM via IV-split addressing#1029

Open
panditsa wants to merge 21 commits intoiree-org:mainfrom
panditsa:sanket/mxfp4_no_valu
Open

Zero in-loop VALU for MXFP4 preshuffle GEMM via IV-split addressing#1029
panditsa wants to merge 21 commits intoiree-org:mainfrom
panditsa:sanket/mxfp4_no_valu

Conversation

@panditsa
Copy link
Contributor

@panditsa panditsa commented Mar 3, 2026

  • Hoist per-lane byte offsets before the K-loop and advance K via scalar soffset, eliminating VALU address ops from the loop body
  • Prove stride constancy symbolically: substitute IV = step * j, simplify floor/Mod collapse, extract coeff(j) — valid for all thread/wave/workgroup values without numeric probing
  • Apply to both handle_read (buffer_load) and handle_gather_to_lds (buffer_load_dword_lds) with shared linearization tail

@panditsa panditsa force-pushed the sanket/mxfp4_no_valu branch from 3cb2ebd to 8bdf94a Compare March 3, 2026 18:42
@panditsa panditsa marked this pull request as ready for review March 3, 2026 21:25
@panditsa panditsa force-pushed the sanket/mxfp4_no_valu branch from 71f963c to 4e00fae Compare March 4, 2026 18:47
panditsa added 17 commits March 4, 2026 17:08
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
@panditsa panditsa force-pushed the sanket/mxfp4_no_valu branch from e1709f4 to 8744a2b Compare March 4, 2026 23:09
@panditsa panditsa changed the title Zero in-loop VALU for MXFP4 preshuffle GEMM via IV-split addressing WIP: Zero in-loop VALU for MXFP4 preshuffle GEMM via IV-split addressing Mar 4, 2026
@panditsa panditsa force-pushed the sanket/mxfp4_no_valu branch from 95fe0a0 to 091aab7 Compare March 5, 2026 04:46
panditsa added 4 commits March 4, 2026 22:46
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
@panditsa panditsa changed the title WIP: Zero in-loop VALU for MXFP4 preshuffle GEMM via IV-split addressing Zero in-loop VALU for MXFP4 preshuffle GEMM via IV-split addressing Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant