CPU-side Simulator for MSCCL XML Files

This repo contains the code that simulates a collective communication algorithm written in XML with CPU threads.

To build this repo, use the following instructions.

mkdir -p build && cd build
cmake .. && make

Currently, we provide the following verifiers:

An allgather-verifier that verifies the validity of an algorithm written for out-of-place AllGather.
An alltoall-verifier that verifies the validity of an algorithm written for out-of-place AllToAll with uniform buffer parition.
An alltoallv-verifier that verifies the validity of an algorithm written for out-of-place AllToAll with variable buffer parition.

To run a verification, use ./<verifier> <xml> <run_iters>. It will execute the algorithm for the specified number of times (run_iters) and check whether the output buffer is correct. Note that alltoallv-verifier takes an additional input csv file ./alltoallv-verifier <xml> <run_iters> <csv>. This file should contain $W\times W$ integer values, given that $W$ is the world size (i.e., ngpus in the XML file). The cell at the $i$-th row and $j$-th column means the number of chunks that are sent from rank $i$ to rank $j$.

Key Idea of Simulation

We simulate a GPU threadblock with a CPU thread, because instructions within a threadblock are executed sequentially. We simulate neighbouring peers in a channel via a FIFO queue (called Mailbox in the source file).

Class organization is as follows. A CommGroup internally holds all of its GpuRanks. A GpuRank internally holds all of its ThreadBlocks, as well as the input/output/scratch buffers.

There is NO lock protecting these buffers, although they may be concurrently read or written by multiple threads. Note that any data hazard should be avoided by specifying correct dependencies in the XML file.

In each run, all ThreadBlocks in all GpuRanks will execute in parallel. The channels are built only once prior to the start of the first run, similar to channels in MSCCL and NCCL.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPU-side Simulator for MSCCL XML Files

Key Idea of Simulation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CPU-side Simulator for MSCCL XML Files

Key Idea of Simulation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages