GPU scheduling integration on Grid Engine

Jose Alcantara (MMB - IRB)

More info

Correlación de eventos de seguridad en entornos HPC

Alexandre Vaqué (CESCA)

More info

Environment Modules

Carles Acosta (UAB) & Dani Masó (IQC - UdG)

More info

Grid Engine Training

Jordi Blasco (XRQTC)

More info

GPU scheduling integration on Grid Engine

Jose Alcantara (MMB - IRB)

Download PDF

Correlación de eventos de seguridad en entornos HPC

Alexandre Vaqué (CESCA)

El CESCA (Centro de Servicios Científicos y Académicos de Cataluña) presenta las herramientas de correlación de eventos utilizados en sus entornos de Cálculo de altas prestaciones. Por un lado, veremos como la herramienta OSSEC, un HIDS (sistema de detección de intrusos) nos facilita la detección de anomalías que pueden ser indicio de intrusiones o de un uso indebido de los recursos. Además esta se complementa con la herramienta de ingeniería operacional Splunk. Splunk es una herramienta única agentless que a través de su poderoso motor de búsqueda, indexa y analiza cualquier tipo de información en tiempo real. Además tiene muchos plugins que permiten integrarse con multitud de dispositivos, dónde en nuestro caso utilizaremos el plugin ‘Splunk for OSSEC’.

Splunk for OSSEC es un proyecto open-source cuyo objetivo inicial es proporcionar una plataforma de preservación y catalogación de eventos, la cuál nos permite generar informes a partir de la información ya modelada por el propio OSSEC, dónde la clasificación de los eventos permite una explotación mucho más fácil, ágil y comoda. Además Splunk for OSSEC aporta nuevas funcionalidades.

Download PDF

Environment Modules

Carles Acosta (UAB) & Dani Masó (IQC - UdG)

Environment Modules is a Linux package that provides a modular utility for managing users environment dynamically. It uses scripts written in Tcl (modulefiles) and it allows us to change the environment, to check dependencies and versions, etc. with simple shell commands. In this talk, we want to show a general and simple vision of Modules: how to install it, the most used commands, how to create the modulefiles and how to do that keeping everything in order.

Download PDF

Grid Engine Training

Jordi Blasco (XRQTC)

The Grid Engine is an open source batch-queuing system used on a computer farm or high-performance computing (HPC) cluster and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs. This software was developed and supported by Sun Microsystems (Know as SGE).

In late 2010, after the purchase of Sun by Oracle, the Grid Engine 6.2 update 6 source code was not included with the binaries, and changes were not put back to the project’s source repository. In response to this, the Grid Engine community started several projects to continue to develop and maintain a free implementation of Grid Engine. At this time, we can find some forks of SGE:

Oracle Grid Engine
Son of Grid Engine
Grid Scheduler
Univa Grid Engine

In the hands-on you can choose to work with the Open Grid Scheduler or the Son of Grid Engine.

Download PDF

Jumbomem

Jordi Blasco (XRQTC)

More info

PXE menu: booting all your tools from the network

Pablo Escobar (CIPF)

More info

SLURM at BSC & SLURM Simulator

Alejandro Lucero (BSC)

More info

Monitoring at BSC

Carles Fenoy (BSC)

More info

Jumbomem

Jordi Blasco (XRQTC)

Download PDF

PXE menu: booting all your tools from the network

Pablo Escobar (CIPF)

Download PDF

SLURM at BSC & SLURM Simulator

Alejandro Lucero (BSC)

SLURM (Simple Linux Utility for Resource Management) is a resource management and scheduling utility for hpc systems. We will explain how we use it in the different clusters and our experiences. We will also present the SLURM Simulator, an extension to SLURM developed by Alejandro Lucero, allowing the user to reproduce a job trace with different scheduling configurations in order to get the best one.

Monitoring at BSC

Carles Fenoy (BSC)

At BSC we use a modified version of ganglia for cluster monitoring. Ganglia was not scalable to thousands of nodes, so BSC developed its own implementation of the gmetad called ggcollector which splits the collection layer from the presentation layer. It is a fast and memory efficient implementation that focuses in collecting information and delivering it to the clients. We will show the implementation and its operation as well as some tools developed to have nearly real time and historical monitoring information of all the systems.

Novell Sentinel

Jacinto Grijalba (Novell)

More info

Site integral management with Puppet

Arnau Bria (PIC)

More info

S-GAE: Sungrid Graphical Accounting Engine

Fernando Galindo & Gabriel Verdejo (RDlab, LSI - UPC)

More info

Introduction to GIT

Jordi Blasco (XRQTC) & Pablo Escobar (CIPF)

More info

Novell Sentinel

Jacinto Grijalba (Novell)

La solución que Novell presenta denominada Sentinel, trata de hacer foco en una propuesta modular,basada en dos componentes fundamentales que permiten realizar operaciones en el ámbito del cumplimiento y basándose en la base de la gestión y correlación de logs.

Novell Sentinel permite tocar puntos fundamentales en todo lo relacionado con la seguridad y la gestión de eventos tales como recolección, almacenamiento y análisis de logs, monitorización en tiempo real, analíticas avanzadas, respuestas automáticas, informes de auditoría, etc. Los componentes expuestos son:

Novell Sentinel Log Manager
Novell Sentinel

Estos dos módulos tienen como propósito fundamental dos tipos de funciones. Básicas y avanzadas. Mientras Novell Sentinel Log Manager está concebido como un sistema básico de gestión de logs de almacenamiento masivo de eventos en alto rendimiento y compresión del 90%, Novell Sentinel está basado en un sistema avanzado en el análisis en profundidad, correlación de eventos en tiempo real y en la automatización de procesos en base a los eventos correlados.

Download PDF

Site integral management with Puppet

Arnau Bria (PIC)

After the first LHC phase where experiments and sites focused efforts to deliver the demanding wLGC metrics, now sites are facing a new dimension which is to consolidate and automate the computing services to reach a steady-state functioning. Starting form the ground, the installation and post-installation mechanics are one of the most important points for the computing centers to avoid initial installation and configuration problems. At Port d’Informació Científica the adopted solution is to steer all post-install and dynamic post-configuration using Puppet.

Puppet is a master entity were to easily define profile that get propagated around the cluster, hence fulfilling the necessities of post-install configurations, after the raw os installation, and ensuring the persistence of the profile and the defined services once has been completey installed.

S-GAE: Sungrid Graphical Accounting Engine

Fernando Galindo & Gabriel Verdejo (RDlab, LSI - UPC)

One of the most important – but sometimes underestimated or forgotten – things to do when managing a high performance queue environment is monitoring the cluster user’s activity. At the RDlab we faced the need of managing large amounts of accounting information from our Sun Grid Engine and displaying it into user-friendly readable statistics.

After realising that the all available solutions did not suit our expectations, we decided to work in the development of a new application. The Sun Grid Graphical Accounting Engine (in short, s-gae) web application offers a new graphical way of managing accounting information. By easily selecting among some filters in forms, the system shows graphs about user, queue or the full cluster usage in a glance.

Developed and released using free technologies such as PHP or mysql, s-gae offers the possibility to display in eye-candy charts full statistics in a lightweight, user-friendly webpage.

Download PDF

Introduction to GIT

Jordi Blasco (XRQTC) & Pablo Escobar (CIPF)

Git is a distributed version control system that allow to handle very large projects with efficiency and with minimal knowledge. The GIT system allows to clone a full repository with complete history and full revision tracking, it’s independent from network accessibility or central server. The branching and merging capabilities are extremely easy, and that’s why it’s becoming the new standard version control system.

Download PDF