Version 1.0-alpha Iteration Plan

- Estimated release date: 
  - [x] public preview (*alpha*): 9/1
  - [x] public preview (*beta*): 9/30
  - [x] refactor kernels & TeSA: 10/15

## P0
- [x] Tuning more steps to show the speedup gain of the pytorch sparse modules
- [x] Support the openai kernel/template
- [x] code review
- [x] Usage Interface(8.19) (update one version on 8.26)
- [x] Fix triton speed(8.19)
- [x] Sparse Softmax Kernel
- [x] Biased OpenAI MatMul Kernel
- [x] finegrained 99% + block size 8x8 95% + block size 32 x 32 
- [x] Documentation (test)
- [x] package data (test)
- [x] sparta.tune(): hook, set search space
- [x] Fix sparse softmax
- [x] Integration test/example: Linear, Softmax
- [x] Fix JIT latency
- [x] Read the docs
- [x] SparTA DDS MatMul kernel
- [x]  Batch MatMul & Softmax
- [x] Sparse Attention
- [x] Add sparse matmul kernel: transpose_A
- [x] Functional
- [x] Support backward
- [x] Add perfermance test: Compare with Triton 1.1.2 (Upload test scripts)
- [x] Test current tuner
- [x] Test Sparse Attention
- [x] Update kernel pycuda interface
- [x] Profile Layout converting
- [x] Construct sparse attention op with linear & softmax ops
- [ ] Beta version: docs, docstrings & examples
- [x] Test on V100; backward
- [x] Fix kernel output
- [ ] Module tuner: get combined search space of connected ops automatically
- [ ] Connect to NNI's new tuner

## P1
- [ ] Apply roller's rules
- [ ] Support multi-process tuning
- [ ] BCSR kernel: convert(), inverse(), swapaxes(), sum(), rebuild TeSA Converter when set_mask()
- [ ] Auto converter: support value mask in matmul kernels
- [ ] PyCUDA device context register & operator.to() (multiple cards)
- [x] Support the multiple sparse formats: sdd dsd, dds for linear
- [ ] Support the block quantization kernel/fp16/bf16
- [ ] Compare Sparse Softmax with Triton's Sparse Softmax and keep improving.
- [x] unit tests
- [x] Model tuning interface / documents / examples
- [ ] Common mask patterns
- [ ] Refactor TeSA (Meta, linter)
- [ ] Fuse layout converting into kernels

## P2
- [ ] Support the offline LUT or the kernel cache/DB



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version 1.0-alpha Iteration Plan #3

P0

P1

P2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Version 1.0-alpha Iteration Plan #3

Description

P0

P1

P2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions