Skip to content

dat-rohit/distributed-neural-network

Repository files navigation

Enhancing CNN Training on CIFAR-10 Through MPI Parallelization

This repository contains the source code for training a Convolutional Neural Network (CNN) using different parallelization strategies, Project_Report synthetizes the experiments led during this project. Below is a brief overview of the key components:

Models

This file contains the implementation of the CNN architecture used in the training process.

Training Scripts

This script implements the training of the model without any parallelization. It serves as a baseline for performance comparison with parallelized approaches.

In this script, the training is performed with only model replication over processes, without data parallelism. It's designed to showcase the impact of replicating the model across multiple processes.

The core script that implements the data parallelism approach. It includes time measurement functionalities and a fault tolerance simulation. This approach distributes the computational workload across multiple processes, aiming to improve training efficiency.

Usage

mpiexec -n {number of process} python data_parallelism_train.py --nb-proc {number of process} 

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published