Talks > 15-16/06/2017 Jordi Blasco

Modernising the HPC cluster provisioning

HPC has quickly adopted agile strategies and DevOps technologies in cluster provisioning. While the standard deployment systems are DevOps friendly, the image based provisioning systems are usually much faster. The Torrent protocol has been key to accelerating OS image propagation and reducing the time to production. On top of both systems, configuration managers like Puppet, Ansible or CFengine can operate in order to provide consistency across the cluster.

OS provisioning based on local disks is often not reliable and is usually more expensive for a reasonable sized cluster. Read-only NFSROOT provisioning allows one to interact with the image but the NFS server becomes a very critical SPOF. Stateless solutions are more reliable but less flexible with potentially high memory footprints.

Experience has shown that all of these strategies are not enough to cover all needs and not flexible enough to accommodate changes online without breaking the valuable DevOps approach.

In this talk I’m going to introduce a new technology developed by HPCNow! as part of the sNow! cluster manager which provides the flexibility of read-only NFSROOT image provisioning, the scalability of diskless provisioning, the reliability of HPC cluster file systems, and the ability to incorporate DevOps and continuous integration strategies.


Related Talks

Rafael Griman

bscs 4: Future of Cluster Management

15-16/10/2012

Alfred Gil

Cluster Deployment with FAI

13-14/01/2014

Jarrod Johnson

Cluster Management with Confluent

26/06/2021

Visit our forum

One of the main goals of this project is to motivate new initiatives and collaborations in the HPC field. Visit our forum to share your knowledge and discuss with other HPC experts!

About us

HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.

Promo HPCNow