Skip to content

parlab-tuwien/lib-ncclbench

lib-ncclbench

This is the lib-ncclbench project, a library for benchmarking NCCL operations inspired by NCCL Tests, but with some differences:

  • NCCL Tests has calls to MPI_Barrier in the timing region, distorting the results.
  • ncclbench reports timings for all calls, instead of only the average.
  • ncclbench allows for benchmarking both by number of iterations and by total time.

Running the benchmarks

We provide an example of benchmark in the example directory. You can build (using -DBUILD_EXAMPLES=ON option) and run it to see how the library works.

A typical run of the ncclbench example looks like this:

$ mpirun -n 4 ./example/ncclbench --operation ncclAllReduce --sizes 1024 --data-type float --blocking --csv --warmups 10 --iterations 100 --time 2
Operation,Blocking,Data_Type,Msg_Size_B,#Elements,Iterations,Stream_Sync_us,Time_us,AlgBW_GBps,BusBW_GBps
ncclAllReduce,Yes,float,1024,256,1,1.57722,25.428,0.0375049,0.0562573
ncclAllReduce,Yes,float,1024,256,1,1.57722,21.042,0.0453224,0.0679836
ncclAllReduce,Yes,float,1024,256,1,1.57722,23.801,0.0400687,0.060103
ncclAllReduce,Yes,float,1024,256,1,1.57722,21.683,0.0439826,0.0659739
ncclAllReduce,Yes,float,1024,256,1,1.57722,20.626,0.0462365,0.0693548
ncclAllReduce,Yes,float,1024,256,1,1.57722,21.074,0.0452536,0.0678804
ncclAllReduce,Yes,float,1024,256,1,1.57722,20.988,0.045439,0.0681585
...

It will run the ncclAllReduce operation with 1024 bytes of data, using the float data type, in blocking mode, with 10 warmup iterations and 100 timed iterations, or a total time of 2 seconds, whichever comes first. The results will be printed in CSV format.

Building and installing

See the BUILDING document.

Contributing

See the CONTRIBUTING document.

Licensing

See the LICENSE document.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors