Skip to content

Measure the performance of computing the lane id  #2

@gkrls

Description

@gkrls

Currently the lane id is read (here) by accessing the %laneid register. An alternative is to compute it by i % WARPSIZE

According to this and this, reading from the %laneid register is more costly than i % WARPSIZE. It would be good to have some benchmark results that show the difference between the two versions.

First create a benchamarks/laneid/ directory where you will be placing all your files. You can start with some simple kernels, for instance, each thread i reading its lane id and writing it to the i-th index of an array. You may subsequently move to more involved kernels. For each benchmark include appropriate plot(s) and information about your GPU device and CUDA version.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions