Skip to content

Milestones

List view

  • better parallelization kernels for CPU(with SIMD & BLAS), better Scalar value handling, fixing more bugs in both C/CPP & Python codes. Direct C-API. supports CUDA ops (optional)

    Due by February 1, 2026
    1/3 issues closed
  • need to implement parallelization kernels for faster matmul for current version, since this doesn't use naive matmul instead uses transpose then dot-product, it's faster than naive one by 6-7x but we need to make it even faster

    No due date