Talks > 03-04/02/2015 Alfred Gil

Deploying a Hadoop cluster with sNow! in less than 15 minutes

The main goal of this presentation is to show how we have built sNow!, a Linux distribution capable to deploy a Hadoop cluster in the most comfortably way a sysadmin can imagine (the myth about lazy sysadmins is already well known, and although it’s only a myth, we are proud to contribute to strengthen it). To achieve this objective, the installation of the whole system should be as unattended as possible, and must provide all the required tools the sysadmin may need along the process.

We have chosen the Cloudera package, since it encloses Apache Hadoop as well as an extended collection of tools complementing it, including a powerful management dashboard. Cloudera must be installed on a previously existing operating system, and the set up must be done in two steps. First, the installation and configuration of the cluster, and second, the deployment of Cloudera application.

Aiming to join these two steps in an unified process, we have developed sNow!, a Linux distribution which integrates both, initial cluster installation and the deployment of the hadoop ecosystem. As a result, this tool allows to simplify and automate all the tasks involved in the correct set up of the systems being part of a Hadoop cluster.

We use Debian as the base operating system, since it is one of the most stables distributions available. The final product is an installation ISO, which will install and properly configure the management node. Once the management node is up and running, the operating system in the computing nodes are automatically installed by only booting them. Finally, the configuration and deployment of the Hadoop cluster is done via web interface from the management node.

In this presentation, we will show a live demo of the system, executed on a virtualized environment.

Related Talks

Visit our forum

One of the main goals of this project is to motivate new initiatives and collaborations in the HPC field. Visit our forum to share your knowledge and discuss with other HPC experts!

About us

HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.

Promo HPCNow