-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add CC register capabilities in Rust and in the CUDA builder and optional emulation bf16 fp8 #2704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Shouldn't the new code be in some |
|
You are certainly right because I do not encounter the condition CUDA_ARCH >= 800 in this control flow in my case but this must possibly cause a function cuda error already present, I will review this. |
|
I think the current code is likely to result in lots of compile failures with cuda compute cap >= 8.0. |
|
I hope that the latest additions will allow to work on |
ec25c81 to
09e3d0b
Compare
|
With this, we can notice for a similar token/s in f16 or bf16 (fot short sentence), the results bf16 are identical to an original model bf16 despite fallbacks, numerical fidelity preserved, model behavior unchanged. Tests:
Issue:
In resolving conflicts, i added fallback fp8 related #2989 |
ba56fc9 to
d5f31ea
Compare
… and enhance bfloat16/fp8 compatibility.
tested and works with:
related EricLBuehler#57