Skip to content

[WIP] Schedule 7.1 dynamic size#991

Draft
Hardcode84 wants to merge 12 commits intoiree-org:mainfrom
Hardcode84:dynamic-size
Draft

[WIP] Schedule 7.1 dynamic size#991
Hardcode84 wants to merge 12 commits intoiree-org:mainfrom
Hardcode84:dynamic-size

Conversation

@Hardcode84
Copy link
Contributor

No description provided.

panditsa and others added 12 commits March 2, 2026 15:15
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Six infrastructure fixes needed for dynamic dims with the wave_asm backend:

1. scheduling/schedule.py: translate node_mapping keys back to original
   graph nodes after graph_copy in the dynamic pipelining path, fixing
   identity mismatch in _update_kernel_node_mapping (0/165 → 165/165).

2. wave_schedule_ops.py: use iterate's owning graph for subgraph
   reordering instead of hardcoding get_root_graph() (pipelined iterate
   lives inside a conditional subgraph with dynamic shapes).

3. unrolling.py: guard unroll count validation with is_literal() so
   symbolic counts don't raise TypeError.

4. emitter.py: handle Rational operands in Mod via the identity
   Mod(a/b,c)=Mod(a,b*c)/b; resolve terminal rationals with divsi.

5. read_write.py: initialize offset_th=None before masked code path
   (pre-existing bug only triggered by dynamic shapes).

6. host_codegen.py: resolve derived buffer shapes (K/2, K/32) for
   dynamic symbol recovery using infer_dim + sympy.solve, and evaluate
   dimension expressions via gen_sympy_index.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Uses V_ASHRREV_I32 (arithmetic right shift) for signed division by
power-of-2 constants, mirroring the existing arith.divui handler
which uses V_LSHRREV_B32 (logical right shift). Required for dynamic
M/N/K support where K/2 and K/32 appear as arith.divsi operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
When use_buffer_ops is enabled, avoid emitting vector<Nxindex> arith
ops for bounds-check masks and OOB index selection. Instead:

- _build_mask: new scalarize option builds per-element scalar cmpi
  and assembles the mask with vector.from_elements.
- _create_vec_read_write: replace vector broadcast/addi/select with
  a scalar loop computing offset_th+i per element.
- Enable use_buffer_ops in the dynamic-dims preshuffle-B test.

This eliminates all vector<16xindex> ops from the dynamic-dims MLIR,
which the WaveASM backend cannot translate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
- Add handleVectorFromElements: packs sub-dword scalars into VGPRs
  using V_LSHL_OR_B32 chains, combines DWORDs with PackOp.
- Fix handleMemRefLoad to emit buffer_load_ubyte / ds_read_u8 for
  i8 element types instead of unconditionally emitting dword loads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants