You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 27, 2024. It is now read-only.
I run into performance issue when working with matrices of this specific size M=4096, N=4096, K=16, if I modify N to 4097 or 4095 performance is changed dramatically:
Also I noticed somewhat similar performance drop in other libraries like clBLAS (171.722 vs 245.325 gflops) but to lesser extend. I'm trying to understand the root cause of the issue - why there is 4x performance drop.