| Algorithms | Variants |
|---|---|
| Random | bernoulli normal uniform |
| Quantization | symmetric per-block per-tensor q2 q4 q8 fp4 |
| Reduction | mean sum prod max min arg[max|min] per-cube per-plane |
| Matmul | mma unit tma multi-stage specialization ordered multi-rows |
| Convolution | mma unit tma multi-stage im2col |
| Attention | mma unit multi-rows |
If you want to contribute new kernels, please read the GUIDE.md.
Note: This applies to most kernels, but
reduceworks slightly differently for now, see its README.
Three test suites are available:
- Smoke test suite: a tractable subset of representative tests that run on the CI.
- Extended test suite: usually auto-generated combinatorial tests covering many configurations. Good to run when developing kernels. Normally kept tractable.
- Full test suite: all generable test combinations; may be too large to compile or run practically.
Run tests with
# Replace <runtime> with cpu, cuda, rocm, wgpu, vulkan or metal
# Smoke test suite
cargo test-<runtime>
# Extended test suite
cargo test-<runtime>-extended
# Full test suite
cargo test-<runtime>-fullYou can control test behavior by setting the CUBE_TEST_MODE environment variable.
For more details, see Test Mode.
-
CUBE_TEST_MODE=correct(default)
Tests pass if results are numerically correct or if the kernel was launched with an invalid configuration.- Useful when tests are auto-generated from multiple parameter combinations, where some invalid configurations are expected.
- Failing tests display only the first index with a discrepancy.
-
CUBE_TEST_MODE=strict
Tests pass only if they compile, run, and produce numerically accurate results.- Ideal for debugging to avoid false positives that can occur in
correctmode.
- Ideal for debugging to avoid false positives that can occur in
-
CUBE_TEST_MODE=printfail
Similar tocorrectmode: tests pass if results are correct or if the kernel is invalid.- Failing tests show all tensor discrepancies.
- Supports filtering, e.g.:
CUBE_TEST_MODE=printfail:0,.,10-20shows elements from the 0th first dimension, all of the second, and elements 10–20 in the third.
-
CUBE_TEST_MODE=printall
All tests fail, displaying all tensor discrepancies.- Filtering works the same as in
printfail.
- Filtering works the same as in
-
CUBE_TEST_MODE=failifrun
Only tests that compile and run will fail; others succeed.- Useful for tracking critical tests in large suites.
