-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Hacktoberfestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Currently the lane id is read (here) by accessing the %laneid register. An alternative is to compute it by i % WARPSIZE
According to this and this, reading from the %laneid register is more costly than i % WARPSIZE. It would be good to have some benchmark results that show the difference between the two versions.
First create a benchamarks/laneid/ directory where you will be placing all your files. You can start with some simple kernels, for instance, each thread i reading its lane id and writing it to the i-th index of an array. You may subsequently move to more involved kernels. For each benchmark include appropriate plot(s) and information about your GPU device and CUDA version.
Metadata
Metadata
Assignees
Labels
Hacktoberfestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed