Slurm Training '15

We were pleased to host the Slurm Training'15 organized by HPCNow! and IQTC-UB. This was a training course aimed to share expertise and strategies of the most popular Batch Queue System in High Performance Computing. It was held at the University of Barcelona, from 5th to 6th February, 2015 in conjunction with HPC Knowledge Meeting'15 and BeeGFS Training'15, also organized by HPCNow!, IQTC-UB and ThinkParQ.

Sponsors Adamantium Sponsors Platinum Sponsors Gold Sponsors Silver

   
     

 

Abstract :

The High Performance Computing Centre of the University of Strasbourg (http://hpc.unistra.fr) operates heterogenous computing nodes, grouped in a single cluster, composed of 325 nodes (4500 cores).

Half of the cluster (145 nodes) was funded by the University. The rest of the computing nodes (currently 180) were bought by research labs. More and more labs tend to buy nodes nowadays.

Currently, we have 5 generations of processors in these resources. Each generation of processor corresponds to a Slurm partition.

We offer access to computing resources with Slurm, implementing different qualities of service:

  • For all users : best effort access to all resources, when resources are available;
  • For research groups that contributed to the cluster: priority based on the number of purchased nodes;
  • For researchers selected by project: priority access to the machines belonging to the computing center.

The same user may be at the same time in these three cases.

These three QOS work very well together: 40 milions of compute hours are produced each year and computing power has increased steadily.

We are using very advanced Slurm functionality to deliver a high­level service to our users.

This study case will detail the mechanisms implemented in Slurm to address all the constraints of access rights, restitution of hours based on contribution, and access to the quota of hours.

Specifically, we will address the following points:

  • users management: accounts, dealing with users, slurm database, system configuration, daemons;
  • useful commands: sacct, sreport, sacctmgr, sshare, sstat;
  • the concepts of accounting: data accounted, consolidated data, querying the database (including with sql);
  • preemption: account set­up for preemption, preemption methods, why and when use preemption;
  • quota management and fairshare: why do we need additional tools (some sample tools will be presented).

Title: A case study for computing centers: Accounts, Priorities and Quotas

Speaker: Michel Ringenbach (High Performance Computing Center of the University of Strasbourg, France)

Date : 6th February 2015

Location : Faculty of Chemistry (UB), C/ Martí i Franqués 1, 08028 Barcelona, Spain.

Slides : PENDING

Schedule Slurm Training'15

Beginner to Intermediate
Registration and Coffee Meet Up
9:00h
9:30h Jordi Blasco (HPCNow!)
10:00h Jordi Blasco (HPCNow!)
11:15h Jordi Blasco (HPCNow!)
Lunch Break
12:00h
13:30h Jordi Blasco (HPCNow!)
QoS
14:00h Jordi Blasco (HPCNow!)
Coffee Break
15:00h
15:30h Jordi Blasco (HPCNow!)
16:30h Jordi Blasco (HPCNow!)
6 February 2015
Intermediate to Advanced
Coffee Meet Up
9:00h
9:30 Jordi Blasco (HPCNow!)
10:30h Jordi Blasco (HPCNow!)
11:00h Carles Fenoy (BSC)
11:30h Jordi Blasco (HPCNow!)
Lunch
12:00h
13:30h Sergio Iserte (UJI)
14:30h Michel Ringenbach (HPC Center @ University of Strasbourg)
15:30h Alejandro Sanchez Graells (BSC)
Wrap-Up
16:30h