Talks > 05/06/2025 Sowmya Shree

Distributed Deep Learning Training with Enroot on PARAM Rudra

The poster presents an approach to scalable, secure, and efficient containerization for high-performance computing (HPC) workloads, focusing on distributed deep learning training on the PARAM Rudra supercomputing facility.

The methodology leverages Enroot, a lightweight containerization tool, integrated with SLURM via the Pyxis plugin, to optimize resource utilization and enable seamless deployment across nodes.

The solution emphasizes the advantages of containerization, including portability, ease of use, and reproducibility, while addressing the challenges of traditional methods.

The work showcases the PARAM Rudra system’s capabilities, offering insights into the potential of containerized distributed training for advancing AI research and scientific applications.


Related Talks

Visit our forum

One of the main goals of this project is to motivate new initiatives and collaborations in the HPC field. Visit our forum to share your knowledge and discuss with other HPC experts!

About us

HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.

Promo HPCNow