Algorithms

CubeK: high-performance multi-platform kernels in CubeCL

Algorithms

Algorithms	Variants
Random	`bernoulli` `normal` `uniform`
Quantization	`symmetric` `per-block` `per-tensor` `q2` `q4` `q8` `fp4`
Reduction	`mean` `sum` `prod` `max` `min` `arg[max\|min]` `per-cube` `per-plane`
Matmul	`mma` `unit` `tma` `multi-stage` `specialization` `ordered` `multi-rows`
Convolution	`mma` `unit` `tma` `multi-stage` `im2col`
Attention	`mma` `unit` `multi-rows`

Contributing

If you want to contribute new kernels, please read the GUIDE.md.

Running tests

Note: This applies to most kernels, but reduce works slightly differently for now, see its README.

Command

Three test suites are available:

Smoke test suite: a tractable subset of representative tests that run on the CI.
Extended test suite: usually auto-generated combinatorial tests covering many configurations. Good to run when developing kernels. Normally kept tractable.
Full test suite: all generable test combinations; may be too large to compile or run practically.

Run tests with

# Replace <runtime> with cpu, cuda, rocm, wgpu, vulkan or metal

# Smoke test suite
cargo test-<runtime>

# Extended test suite
cargo test-<runtime>-extended

# Full test suite
cargo test-<runtime>-full

Cube test mode

You can control test behavior by setting the CUBE_TEST_MODE environment variable.
For more details, see Test Mode.

Modes

CUBE_TEST_MODE=correct (default)
Tests pass if results are numerically correct or if the kernel was launched with an invalid configuration.
- Useful when tests are auto-generated from multiple parameter combinations, where some invalid configurations are expected.
- Failing tests display only the first index with a discrepancy.
CUBE_TEST_MODE=strict
Tests pass only if they compile, run, and produce numerically accurate results.
- Ideal for debugging to avoid false positives that can occur in correct mode.
CUBE_TEST_MODE=printfail
Similar to correct mode: tests pass if results are correct or if the kernel is invalid.
- Failing tests show all tensor discrepancies.
- Supports filtering, e.g.: CUBE_TEST_MODE=printfail:0,.,10-20 shows elements from the 0th first dimension, all of the second, and elements 10–20 in the third.
CUBE_TEST_MODE=printall
All tests fail, displaying all tensor discrepancies.
- Filtering works the same as in printfail.
CUBE_TEST_MODE=failifrun
Only tests that compile and run will fail; others succeed.
- Useful for tracking critical tests in large suites.

Name		Name	Last commit message	Last commit date
Latest commit History 459 Commits
.cargo		.cargo
.github		.github
assets		assets
benchmarks		benchmarks
crates		crates
xtask		xtask
.gitignore		.gitignore
Cargo.toml		Cargo.toml
GUIDE.md		GUIDE.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
_typos.toml		_typos.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Algorithms

Contributing

Running tests

Command

Cube test mode

Modes

About

Licenses found

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Algorithms

Contributing

Running tests

Command

Cube test mode

Modes

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages