A model for analyzing the weaknesses of your program!
Table of Contents
The TopDown tool analyzes the bottlenecks of applications running on NVIDIA GPUs. The tool supports ANY TYPE of application running on your GPU.
The objective of the application is to try to detect the parts of the architecture that have suffered the most during the execution of the application. In other words, it is intended to check the parts of the architecture that have been stalled during the execution of a given application. The objective is to verify that parts of your GPU architecture have caused a loss of IPC (Instructions per Cycle) compared to the ideal IPC of your GPU (that is, the maximum IPC that your GPU can achieve). For this, an exhaustive analysis of the different parts of the architecture will be carried out, to try to clarify the losses.
To carry out the above tasks, the application has several levels of execution. As we go deeper and lower levels, the level of detail ofthe results will increase, providing specific parts of the architecture. The highest level offers less detail, grouping more parts of the architecture into one.
The TopDown tool analyzes the bottlenecks of applications running on NVIDIA GPUs. The tool supports ANY TYPE of application running on your GPU.
The objective of the application is to try to detect the parts of the architecture that have suffered the most during the execution of the application. In other words, it is intended to check the parts of the architecture that have been stalled during the execution of a given application. The objective is to verify that parts of your GPU architecture have caused a loss of IPC (Instructions per Cycle) compared to the ideal IPC of your GPU (that is, the maximum IPC that your GPU can achieve). For this, an exhaustive analysis of the different parts of the architecture will be carried out, to try to clarify the losses.
To carry out the above tasks, the application has several levels of execution. As we go deeper and lower levels, the level of detail ofthe results will increase, providing specific parts of the architecture. The highest level offers less detail, grouping more parts of the architecture into one.
Throughout this section we will discuss how the application should be used correctly, as well as the necessary utilities for it to work correctly.
This application is developed in Python, so it will be necessary to have a python compiler to be able to execute the program. It is advisable to have a sufficiently updated version, since the code has elements that are not typical of old versions (less than python3). This is why it is essential to have at least one version of python3 installed.
The following summarizes the commands necessary to install a stable version of python3 for this application:
- python3 (version 3.6)
# update repositories
sudo add-apt-repository ppa:jonathonf/python-3.6
# update
sudo apt-get update
# install
sudo apt-get install python3.6
# this command must return the version (3.6) without errors
python3.6 -VIn the same way, remember that it is also necessary that you have the necessary CUDA toolkit to be able to do the analysis. The program automatically detects your GPU (Compute Capbility or CC) version and performs the analysis with the appropriate tool. You should only have the appropriate one installed for your GPU. The two analysis tools are as follows:
- Nvprof (CC < 7.2)
- Nsight Compute (CC >= 7.2)
The tool provided has the ability to determine the CC of your GPU. To do this, you must have a version of CUDA installed. If you do not have it installed, in section 3 of the following link, NVIDIA provides the necessary steps to perform this task:
- install CUDA (NVDIA reference)
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.htmlIt is very important that you follow the steps in detail, without skipping any. In the same way, once the installation process has been carried out, you must carry out the "post-installation" process, through the following link (section 9)
- post-installation CUDA (NVDIA reference)
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
Finalmente, la última tarea que debe realizar es la de otorgar permisos a la herramienta para medir los resultados. Para ello, siga los pasos mencionados en el enlace siguiente
- grant permission to measure results (NVDIA reference)
https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters
Once you have completed the CUDA installation, you are ready to use the tool to check your CC. To do this, it is recommended that you go to the next section and install the tool.
- Download the application
git clone https://github.com/asf174/TopDownNvidia.git
- Check Compute Capability [OPTIONAL]
cd TopDownNvidia/src/measure_parts # run program. It returns the CC # Otherwise, error. CORRECT ERROR # in order to use the tool nvcc compute_capability.cu --run
- Add program to PATH [OPTIONAL]
# <PATH_UNTIL_TOPDOWN_REPOSITORY>: path until repository echo "PATH=<PATH_UNTIL_TOPDOWN_REPOSITORY>/TopDownNvidia/src:$PATH" >> $HOME/.bashrc
- Define the TopDown environment variable, i.e, the PATH until the repository
# <PATH_UNTIL_TOPDOWN_REPOSITORY>: path until repository echo "export DIR_UNTIL_TOPDOWN="<PATH_UNTIL_TOPDOWN_REPOSITORY>" >> $HOME/.bashrc
- Update enviroment variable
source $HOME/.bashrc
- Install the tool dependencies
# update pip python -m pip install --upgrade pip # install graph dependencies pip install matplotlib pip install plotly
- Check Options [OPTIONAL]
python3 topdown.py -h
The command syntax is as follows:
topdown.py [OPTIONS] -f [PROGRAM] -l [NUM]where [PROGRAM] is the path to your program to be analyzed and [NUM] is the level of the TopDown. Furthermore, you can run topdown.py with '-h' option to see ALL program features. Below is an example of all the available options:
$ topdown.py -h
usage: topdown.py [OPTIONS] -f [PROGRAM] -l [NUM]
TopDown methodology on NVIDIA's GPUs
Optional arguments:
-h, --help show this help message and exit
-f [PROGRAM [PROGRAM ...]], --file [PROGRAM [PROGRAM ...]] run file. Path to file.
-o [FILE], --output [FILE] output file. Path to file.
-v, --verbose long description of results.
-dc, --delete-content If '-o/--output' is set delete output's file contents before write results.
-nd, --no-desc don't show description of results.
-m, --metrics show metrics computed by NVIDIA scan tool.
-e, --events show eventss computed by NVIDIA scan tool.
-am, --all-measurements show all measures computed by NVIDIA scan tool.
-g, --graph show graph with description of results.
-og [OUTPUT_GRAPH_FILE], --output-graph [OUTPUT_GRAPH_FILE] output graph file. Path to file.
-os [OUTPUT_SCAN_FILE], --output-scan [OUTPUT_SCAN_FILE] output scan file. Path to file.
-is [INPUT_SCAN_FILE], --input-scan [INPUT_SCAN_FILE] input scan file. Path to file.
Required arguments:
-l [NUM], --level [NUM] level of execution.
Check options to run program