Skip to content

Version 1.0-alpha Iteration Plan #3

@mydmdm

Description

@mydmdm
  • Estimated release date:
    • public preview (alpha): 9/1
    • public preview (beta): 9/30
    • refactor kernels & TeSA: 10/15

P0

  • Tuning more steps to show the speedup gain of the pytorch sparse modules
  • Support the openai kernel/template
  • code review
  • Usage Interface(8.19) (update one version on 8.26)
  • Fix triton speed(8.19)
  • Sparse Softmax Kernel
  • Biased OpenAI MatMul Kernel
  • finegrained 99% + block size 8x8 95% + block size 32 x 32
  • Documentation (test)
  • package data (test)
  • sparta.tune(): hook, set search space
  • Fix sparse softmax
  • Integration test/example: Linear, Softmax
  • Fix JIT latency
  • Read the docs
  • SparTA DDS MatMul kernel
  • Batch MatMul & Softmax
  • Sparse Attention
  • Add sparse matmul kernel: transpose_A
  • Functional
  • Support backward
  • Add perfermance test: Compare with Triton 1.1.2 (Upload test scripts)
  • Test current tuner
  • Test Sparse Attention
  • Update kernel pycuda interface
  • Profile Layout converting
  • Construct sparse attention op with linear & softmax ops
  • Beta version: docs, docstrings & examples
  • Test on V100; backward
  • Fix kernel output
  • Module tuner: get combined search space of connected ops automatically
  • Connect to NNI's new tuner

P1

  • Apply roller's rules
  • Support multi-process tuning
  • BCSR kernel: convert(), inverse(), swapaxes(), sum(), rebuild TeSA Converter when set_mask()
  • Auto converter: support value mask in matmul kernels
  • PyCUDA device context register & operator.to() (multiple cards)
  • Support the multiple sparse formats: sdd dsd, dds for linear
  • Support the block quantization kernel/fp16/bf16
  • Compare Sparse Softmax with Triton's Sparse Softmax and keep improving.
  • unit tests
  • Model tuning interface / documents / examples
  • Common mask patterns
  • Refactor TeSA (Meta, linter)
  • Fuse layout converting into kernels

P2

  • Support the offline LUT or the kernel cache/DB

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions