delveopers/Axon

Milestones

v0.0.3 release
better parallelization kernels for CPU(with SIMD & BLAS), better Scalar value handling, fixing more bugs in both C/CPP & Python codes. Direct C-API. supports CUDA ops (optional)
Due by February 1, 2026
•1/3 issues closed
33% complete2 open 1 closed
More efficient & faster cpu-matmul using the `transpose -> dot-product`
need to implement parallelization kernels for faster matmul for current version, since this doesn't use naive matmul instead uses transpose then dot-product, it's faster than naive one by 6-7x but we need to make it even faster
No due date
0% complete0 open 0 closed