Skip to content
This repository was archived by the owner on Feb 27, 2024. It is now read-only.
This repository was archived by the owner on Feb 27, 2024. It is now read-only.

Performance drop for speciifc tile size M=4096 N=4096 K=16 #48

@artyom-beilis

Description

@artyom-beilis

I'm using rx 560 16CU 4GB/gfx803

I run into performance issue when working with matrices of this specific size M=4096, N=4096, K=16, if I modify N to 4097 or 4095 performance is changed dramatically:

./test_gemm_miopengemm -m 4096 -n 4096 -k 16
70.1651 GFLOPS 7.41242 ms
./test_gemm_miopengemm -m 4096 -n 4095 -k 16
254.339 GFLOPS 2.04438 ms
./test_gemm_miopengemm -m 4096 -n 4097 -k 16
290.907 GFLOPS 1.78827 ms

It is rocm 3.7

Also I noticed somewhat similar performance drop in other libraries like clBLAS (171.722 vs 245.325 gflops) but to lesser extend. I'm trying to understand the root cause of the issue - why there is 4x performance drop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions