performance question

NVIDIA TensorRT supports a similar sparse mode as in the author's paper on the latest Ampere architecture, their actual speedup is very poor, and the speedup is only observable on sizes larger than 1024*1024。

The author's paper does not use hardware instruction set support similar to NVIDIA, but only uses handwritten KERNEL to achieve a greater speedup ratio than the NVIDIA`s paper, so I think the author should release the source code to respond to everyone's doubts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance question #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

performance question #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions