Profiling and Monitoring Deep Learning Training Tasks

This repository contains the data of our experiments for showing profiling and monitoring tools overhead in our paper "Profiling and Monitoring Deep Learning Training Tasks". Also, it contains our simple CUDA benchmark that we used for showing GRACT, SMACT, and SMOCC metrics changes over different numbers of thread blocks and threads in each thread block.

CUDA benchmark - script for running with different configurations - related data
Profiling and monitoring tools overhead measuring experiments
- Profiling tools
- Monitoring tools
Models we trained while measuring the tools' overhead
- Simple CNN-based MNIST classifier
- ResNet50 on ImageNet

Paper presentation video

Cite our paper

latex

@inproceedings{10.1145/3578356.3592589,
author = {Yousefzadeh-Asl-Miandoab, Ehsan and Robroek, Ties and Tozun, Pinar},
title = {Profiling and Monitoring Deep Learning Training Tasks},
year = {2023},
isbn = {9798400700842},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3578356.3592589},
doi = {10.1145/3578356.3592589},
abstract = {The embarrassingly parallel nature of deep learning training tasks makes CPU-GPU co-processors the primary commodity hardware for them. The computing and memory requirements of these tasks, however, do not always align well with the available GPU resources. It is, therefore, important to monitor and profile the behavior of training tasks on co-processors to understand better the requirements of different use cases. In this paper, our goal is to shed more light on the variety of tools for profiling and monitoring deep learning training tasks on server-grade NVIDIA GPUs. In addition to surveying the main characteristics of the tools, we analyze the functional limitations and overheads of each tool by using a both light and heavy training scenario. Our results show that monitoring tools like nvidia-smi and dcgm can be integrated with resource managers for online decision making thanks to their low overheads. On the other hand, one has to be careful about the set of metrics to correctly reason about the GPU utilization. When it comes to profiling, each tool has its time to shine; a framework-based or system-wide GPU profiler can first detect the frequent kernels or bottlenecks, and then, a lower-level GPU profiler can focus on particular kernels at the micro-architectural-level.},
booktitle = {Proceedings of the 3rd Workshop on Machine Learning and Systems},
pages = {18–25},
numpages = {8},
keywords = {co-processors, monitoring, optimization, deep learning, profiling},
location = {Rome, Italy},
series = {EuroMLSys '23}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
benchmark		benchmark
data		data
models		models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Profiling and Monitoring Deep Learning Training Tasks

Paper presentation video

Cite our paper

latex

About

Uh oh!

Releases

Packages

Languages

itu-rad/PMDLT

Folders and files

Latest commit

History

Repository files navigation

Profiling and Monitoring Deep Learning Training Tasks

Paper presentation video

Cite our paper

latex

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages