GPUMemNet: GPU Memory estimator and Neural Network training dataset

This repository contains the artifacts for our work on building a deep learning–based GPU memory estimator for training deep learning models. Since data is central to this effort, we structured the workflow in several key stages:

Data Generation: We developed scripts to automatically generate diverse deep learning training configurations and monitor GPU behavior during training.
Data Cleaning: After collecting raw logs, we processed and cleaned the data using dedicated scripts included here.
Analysis & Modeling: With the cleaned data, we performed exploratory analysis and trained various models to estimate GPU memory usage.
We explored ensemble method, reviewed related work, and analyzed the overhead introduced by both the data parsers and model inference.

How to Use GPUMemNet

TODO: add a good description, easy and fast to use script, doing estimations

Data Generation Scripts

For each neural network type (MLP, CNN, Transformer), we provide two key files: one that defines the network architecture, and a launcher script that spawns multiple training instances with varying architectural parameters. During training, GPU usage (alongisde with other metrics) is monitored using dcgmi and nvidia-smi, while system metrics are tracked with top.

Note 1: Each deep learning configuration is trained for one minute, one at a time. This sequential execution avoids interference from simultaneous training jobs, which could affect system performance due to shared CPU and DRAM usage.

MLP: MLP model | MLP model launcher
CNN: CNN model | CNN model launcher
Transformer: Transformer model | Transformer model launcher

Future/ Possible Contributions at This Level

Refactoring the launcher script to read parameters from a YAML configuration file.
Extending the Transformer model to support architectures with 1D convolutional layers (e.g., GPT-style models), as it currently supports only linear-layer-based designs.

Data Cleaning Script

Future/ Possible Contributions at This Level

Extend the Transformer data cleaning script to support models that include Conv1D and other types of layers

Data

Visualization, Analysis, and Training Notebooks

We looked into the cleaned data by looking into its distribution based on different selected features, visualized through PCA and TSNE glasses. Also, trained MLP-, and transformer-based models on then to validate the idea of using deep learning for estimating GPU memory usage. For diving into this check more here.

Training, Validation, and Testing with Ensemble Models

To train and test ensemble models, ensure that you are using the correct dataset. When running training or evaluation, specify both the dataset and the model type using the appropriate command-line arguments.

Training:

python train.py --d [mlp, cnn, transformer] --m [mlp, transformer]

Validation:

python kfold_cross_validation.py --d [mlp, cnn, transformer] --m [mlp, transformer]

Training:

python test.py --d [mlp, cnn, transformer] --m [mlp, transformer]

To visualize the results, including the confusion matrix and other statistics, see the visualization notebook.

Overheads of the parser and the models' inference

We also considered and characterized the overheads of parsers and the estimator models' overhead since one of the primary purpose of these estimators can be informing schedules/ resource managers to make more efficient decisions.

Related Work data and sources

We designed experiments to evaluate the effectiveness of the Horus formula and the Fake Tensor library in estimating the GPU memory requirements of deep learning training tasks. Read more here.

Vision

In the discussion section of our paper, we draw the roadmap on how contributors can contribute. As it is an deep learning-based estimator, the potential contributions and improvements to the current study can come from more data points, data points from different GPU models, with broader range of arguments, and also innovations on how to view the GPU memory estimation.

License & Citation

This repository is released for non-commercial academic research purposes only under the following terms:

📦 Code and Notebooks: Custom research-only license. You may use, modify, and share for academic research, but commercial use is prohibited.
🧠 Trained Models: Provided for academic evaluation only. Do not use in commercial products or services without explicit permission.
📊 Dataset: Licensed under CC BY-NC 4.0.
📈 Figures and Visualizations: Also under CC BY-NC 4.0.

📚 Citation

If you use this repository (code, models, data, or ideas), you must cite the following:

GitHub Repository
Ehsan Yousefzadeh-Asl-Miandoab. GPUMemNet: Estimating GPU Memory Requirements for Deep Learning Training Tasks. GitHub Repository: https://github.com/ehsanyousefzadehasl/gpumemnet

@misc{yousefzadeh2025gpumemnet,
  author       = {Ehsan Yousefzadeh-Asl-Miandoab},
  title        = {GPUMemNet: Estimating GPU Memory Requirements for Deep Learning Training Tasks},
  year         = {2025},
  howpublished = {\url{https://github.com/ehsanyousefzadehasl/gpumemnet}},
}

Academic Paper

@article{yousefzadeh2025carma,
  title={CARMA: Collocation-Aware Resource Manager with GPU Memory Estimator},
  author={Yousefzadeh-Asl-Miandoab, Ehsan and Karimzadeh, Reza and Ibragimov, Bulat and Ciorba, Florina M and Tozun, Pinar},
  journal={arXiv preprint arXiv:2508.19073},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
Analysis		Analysis
Assets		Assets
Data_Cleaner_scripts		Data_Cleaner_scripts
Datasets		Datasets
Ensemble		Ensemble
NeuroNetGen_scripts		NeuroNetGen_scripts
Test_Pipeline		Test_Pipeline
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPUMemNet: GPU Memory estimator and Neural Network training dataset

How to Use GPUMemNet

Data Generation Scripts

Future/ Possible Contributions at This Level

Data Cleaning Script

Future/ Possible Contributions at This Level

Data

Visualization, Analysis, and Training Notebooks

Training, Validation, and Testing with Ensemble Models

Overheads of the parser and the models' inference

Related Work data and sources

Vision

License & Citation

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

itu-rad/GPUMemNet

Folders and files

Latest commit

History

Repository files navigation

GPUMemNet: GPU Memory estimator and Neural Network training dataset

How to Use GPUMemNet

Data Generation Scripts

Future/ Possible Contributions at This Level

Data Cleaning Script

Future/ Possible Contributions at This Level

Data

Visualization, Analysis, and Training Notebooks

Training, Validation, and Testing with Ensemble Models

Overheads of the parser and the models' inference

Related Work data and sources

Vision

License & Citation

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages