Skip to content

Conversation

@abrown
Copy link
Contributor

@abrown abrown commented Jan 26, 2026

The "expand to target type" logic added in #231 works for bfloat16 types, not just float16. This change adds lit tests to show the general form of the lowering our convert-triton-cpu-to-llvm does when run.

The "expand to target type" logic added in kernelize-ai#231 works for `bfloat16`
types, not just `float16`. This change adds lit tests to show the
general form of the lowering our `convert-triton-cpu-to-llvm` does when
run.
@abrown
Copy link
Contributor Author

abrown commented Jan 26, 2026

I will note that performance for bfloat16 is quite bad compared to float16 (slower by an order of magnitude): e.g., when bfloat16 support is patched in to the 03-matrix-multiplication.py tutorial, I observe that each bfloat16 value is individually converted using __truncsfbf2 which delegates to some expensive logic (float2bfloat).

@abrown abrown changed the title Add lit tests; shows initial bfloat16 support Add lit tests; shows initial bfloat16 support for tt.dot Jan 26, 2026
@abrown abrown requested a review from alexbaden January 27, 2026 18:45
Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look fine, though once #254 lands I imagine you would have to update the fmul and fadd bits. Might be easier to just check for promotion and assume the promoted value is used (best yet canonicalize in the RUN command so an unused fp extension gets dropped) .

@abrown
Copy link
Contributor Author

abrown commented Jan 28, 2026

I tried to additionally add:

    // COM: And at least 8 multiplications:
    // CHECK-COUNT-8: llvm.fmul

But I think they have to be sequential? I guess I can figure that out in a follow-on PR.

Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this for now with the caveats that long term (1) test_core.py::test_dot is preferred for correctness and (2) we should strive to make lit tests as focused as possible (e.g. only test promotion to fp32 during the llvm lowering).

@abrown abrown requested a review from alexbaden January 29, 2026 17:13
@abrown abrown merged commit d63fd59 into kernelize-ai:main Jan 29, 2026
4 checks passed
@abrown abrown deleted the bf16 branch January 29, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants