The main goal of this presentation is to show how we have built sNow!, a Linux distribution capable to deploy a Hadoop cluster in the most comfortably way a sysadmin can imagine (the myth about lazy sysadmins is already well known, and although it’s only a myth, we are proud to contribute to strengthen it). To achieve this objective, the installation of the whole system should be as unattended as possible, and must provide all the required tools the sysadmin may need along the process.
We have chosen the Cloudera package, since it encloses Apache Hadoop as well as an extended collection of tools complementing it, including a powerful management dashboard. Cloudera must be installed on a previously existing operating system, and the set up must be done in two steps. First, the installation and configuration of the cluster, and second, the deployment of Cloudera application.
Aiming to join these two steps in an unified process, we have developed sNow!, a Linux distribution which integrates both, initial cluster installation and the deployment of the hadoop ecosystem. As a result, this tool allows to simplify and automate all the tasks involved in the correct set up of the systems being part of a Hadoop cluster.
We use Debian as the base operating system, since it is one of the most stables distributions available. The final product is an installation ISO, which will install and properly configure the management node. Once the management node is up and running, the operating system in the computing nodes are automatically installed by only booting them. Finally, the configuration and deployment of the Hadoop cluster is done via web interface from the management node.
In this presentation, we will show a live demo of the system, executed on a virtualized environment.