-
Notifications
You must be signed in to change notification settings - Fork 28
Description
After set_node_indices (pass 12), the attention kernel's FX graph contains permute operations that the water emitter serializes as wave.permute ops in MLIR. The FX importer's _convert_ops dispatcher in fx_emitter.py has no handler for wave.permute and raises ValueError("Unsupported op in MLIR-to-FX conversion: wave.permute").
Permute ops appear in attention because the online softmax computes reductions along a dimension that differs from the matmul accumulation dimension, requiring dimension reordering between the two matmuls. GEMM does not use permute ops.
This blocks passes 12-13 for attention.
Probable Fix:
Add a _handle_permute_op function to fx_emitter.py that creates the corresponding Permute FX node, and register it in the _convert_ops match block. The handler needs to extract the source and target dimension orderings from the MLIR op's attributes and reconstruct the FX node accordingly.