HPC has quickly adopted agile strategies and DevOps technologies in cluster provisioning. While the standard deployment systems are DevOps friendly, the image based provisioning systems are usually much faster. The Torrent protocol has been key to accelerating OS image propagation and reducing the time to production. On top of both systems, configuration managers like Puppet, Ansible or CFengine can operate in order to provide consistency across the cluster.
OS provisioning based on local disks is often not reliable and is usually more expensive for a reasonable sized cluster. Read-only NFSROOT provisioning allows one to interact with the image but the NFS server becomes a very critical SPOF. Stateless solutions are more reliable but less flexible with potentially high memory footprints.
Experience has shown that all of these strategies are not enough to cover all needs and not flexible enough to accommodate changes online without breaking the valuable DevOps approach.
In this talk I’m going to introduce a new technology developed by HPCNow! as part of the sNow! cluster manager which provides the flexibility of read-only NFSROOT image provisioning, the scalability of diskless provisioning, the reliability of HPC cluster file systems, and the ability to incorporate DevOps and continuous integration strategies.