Review of the large scale related optimizations performed on the well known resnet50 training workload on DGX based clusters. The optimization concepts which combine the profiling and modeling of workload’s execution at scale can be applied to other deep learning neural networks running on GPU based clusters.
Related Talks

About us
HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.