Look at how distributed memory ML benchmarks perform across different systems. Could be based on the work that Dell have described in their blogs: http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2018/03/05/deep-learning-performance-with-intel-caffe-training-cpu-model-choice-and-scalability http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2017/11/22/scaling-deep-learning-on-multiple-v100-nodes http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2017/09/27/deep-learning-on-v100