Skip to content

[compiler] Fix Rational codegen and pipeline unroll bug for dynamic shapes#1060

Closed
xintin wants to merge 2 commits intomainfrom
xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start
Closed

[compiler] Fix Rational codegen and pipeline unroll bug for dynamic shapes#1060
xintin wants to merge 2 commits intomainfrom
xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start

Conversation

@xintin
Copy link
Contributor

@xintin xintin commented Mar 6, 2026

Summary

  • Handle _Rational values in gen_sympy_index for Max, Min, Piecewise, and comparison operators (StrictGreaterThan, GreaterThan, Eq, Ne) that previously threw CodegenError with non-standard block sizes.
  • Fix pipeline unroll interaction for dynamic shapes where the kernel loop count was not divisible by the unroll factor, causing the scf.for loop to execute extra iterations that read invalid pipeline state. Thread unroll_factor through PipelinedLoop to build_guarded_pipeline_with_remainder so pipelined_iterations is rounded to guarantee divisibility.
  • Extend testScaledGemmMXFP4PreshuffleB with a new block shape (32,64,256) at shape (1024,2048,16384), wave_shape and dynamic_shapes parameters.

Test plan

  • CI passes on existing tests
  • New test case testScaledGemmMXFP4PreshuffleB with block shape (32,64,256) passes
  • Dynamic shape scenarios with non-divisible unroll factors produce correct results

Made with Cursor

harsh-nod and others added 2 commits March 6, 2026 00:28
…hapes

Handle _Rational values in gen_sympy_index for Max, Min, Piecewise,
and comparison operators (StrictGreaterThan, GreaterThan, Eq, Ne) that
previously threw CodegenError with non-standard block sizes.

Fix pipeline unroll interaction for dynamic shapes where the kernel
loop count was not divisible by the unroll factor, causing the scf.for
loop to execute extra iterations that read invalid pipeline state.
Thread unroll_factor through PipelinedLoop to
build_guarded_pipeline_with_remainder so pipelined_iterations is
rounded to guarantee divisibility.

Extend testScaledGemmMXFP4PreshuffleB with a new block shape
(32,64,256) at shape (1024,2048,16384), wave_shape and dynamic_shapes
parameters.

Signed-off-by: Harsh Menon <harsh.menon@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin closed this Mar 6, 2026
@xintin xintin deleted the xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start branch March 6, 2026 03:36
@xintin xintin restored the xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start branch March 6, 2026 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants