Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 319 Bytes

File metadata and controls

7 lines (4 loc) · 319 Bytes

Optimize SGEMM step by step

This project is a step-by-step guide to optimize the SGEMM step of the SGEMM algorithm.

Note: CUBLAS 中的矩阵是列主序的,因此为了便于对比,下面的 kernel 对应的宏也是基于列主序的,这样方便与 CUBLAS 计算得到的矩阵进行对比。

Kernel 1