The way to manage the configuration of computing nodes in HPC clusters is normally through, first, the use of some kind of master image deployed to the nodes and, second, a “post configuration” stage in which the installed system is modified in order to adapt it to the changes made to this base image: modified SLURM configuration files, new filesystems to be mounted, updated packages, new monitoring tools to be installed, etc.
One way to deal with this post-configuration stage, and also with further changes which happen along the life of a computing node, is using a Configuration Management System – CMS. CMS’s, such as CFEngine, Puppet, Chef, Salt, etc., are specifically designed to deal with system configuration changes and to mantain consistency in complex systems: they allow us to define nodes’ service states, configuration files, packages installed, mount points, security policies and much more.
But this also comes with a price: a steep learning curve and the CMS system setup itself. Here we will present Ansible, a very easy to use CMS which, with its clientless (zero initial setup in the nodes) push model and the simple, human readable syntax of its YAML configuration files, perfectly fits the mindset of HPC cluster administrators.