Enhancing CNN Training on CIFAR-10 Through MPI Parallelization

This repository contains the source code for training a Convolutional Neural Network (CNN) using different parallelization strategies, Project_Report synthetizes the experiments led during this project. Below is a brief overview of the key components:

Models

models/model.py

This file contains the implementation of the CNN architecture used in the training process.

Training Scripts

single_proc_train.py

This script implements the training of the model without any parallelization. It serves as a baseline for performance comparison with parallelized approaches.

model_replication_train.py

In this script, the training is performed with only model replication over processes, without data parallelism. It's designed to showcase the impact of replicating the model across multiple processes.

data_parallelism_train.py

The core script that implements the data parallelism approach. It includes time measurement functionalities and a fault tolerance simulation. This approach distributes the computational workload across multiple processes, aiming to improve training efficiency.

Usage

mpiexec -n {number of process} python data_parallelism_train.py --nb-proc {number of process}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
log		log
models		models
.gitignore		.gitignore
LICENSE		LICENSE
Project_Report.pdf		Project_Report.pdf
README.md		README.md
data_parallelism_train.py		data_parallelism_train.py
model_replication_train.py		model_replication_train.py
run_training.sh		run_training.sh
single_proc_train.py		single_proc_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing CNN Training on CIFAR-10 Through MPI Parallelization

Models

models/model.py

Training Scripts

single_proc_train.py

model_replication_train.py

data_parallelism_train.py

Usage

About

Uh oh!

Releases

Packages

Languages

License

dat-rohit/distributed-neural-network

Folders and files

Latest commit

History

Repository files navigation

Enhancing CNN Training on CIFAR-10 Through MPI Parallelization

Models

models/model.py

Training Scripts

single_proc_train.py

model_replication_train.py

data_parallelism_train.py

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages