-
Notifications
You must be signed in to change notification settings - Fork 1
Use llvm.fma for tt.dot lowering
#254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This is a draft for now until we can discuss what to do about the fastmath flags. |
| // Multiply and accumulate. | ||
| auto mul = LLVM::FMulOp::create(builder, loc, tgtTy, aElem, bElem); | ||
| accum = LLVM::FAddOp::create(builder, loc, tgtTy, accum, mul); | ||
| auto flags = LLVM::FastmathFlagsAttr::get(builder.getContext(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl.dot_scaled has a fast math flag, but triton typically prefers fast math to be off
This change replaces the `llvm.fmul` and `llvm.fadd` instructions with the fused `llvm.fma` operation. This should have no downstream impact on the emitted machine code which, due to auto-vectorization and other LLVM magic, already ends up using `VFMADD213PS`. What _is_ unclear about this change is that we materialize some fastmath flags from thin air: it seems like we should be able to configure this somewhere at the user level (TODO).
|
This has no effect on performance. I still see |
This change replaces the
llvm.fmulandllvm.faddinstructions with the fusedllvm.fmaoperation. This should have no downstream impact on the emitted machine code which, due to auto-vectorization and other LLVM magic, already ends up usingVFMADD213PS.What is unclear about this change is that we materialize some fastmath flags from thin air: it seems like we should be able to configure this somewhere at the user level (TODO).