Adoption of cloud technologies by high performance computing (HPC) is accelerating, and HPC users want their applications to perform well everywhere. While container orchestration frameworks provide advantages like resiliency, elasticity, and declarative management, they are not designed to natively enable application performance to the same degree as HPC workload managers and schedulers. In this talk, we present our effort in promoting Cloud and HPC convergence, an effort that spans computing platforms and institutions. Specifically, we target Kubernetes platforms with the introduction of the Flux Operator and the Fluence plugin scheduler, both based on the Flux open-source HPC resource manager and job scheduler. We also describe the challenges we faced when shifting from traditional HPC to Kubernetes, and what it takes to run workflow natively on such platforms.
Related Talks
About us
HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.