DATE & TIME

19/06/2020

09:00 - 09:45

SPEAKER

Benjamin Depardon

CTO
UCit

DESCRIPTION

While HPC clusters are designed to run at “peak capacity”, administrators often find themselves facing Congestion and Contention issues with jobs ending-up piling in queues. In such cases, whether the cluster is under-utilized (Congestion) or running at full capacity (Contention), delivering a good QOS to the end-users is administrators’ priority.

UCit’s framework provides a set of customizable tools such as Analyze-IT and Predict-IT created to help identify the optimum strategies (either on-premises or in the Cloud) to match capacity and demand in order to respond properly to these situations.

Fed by HPC clusters’ logs (accounting, applications…) it offers capabilities to explore the behavior of users and jobs on the cluster as well as detect problematic events with the aim of recommending corrective actions. It also allows training of specific ML predictors in order to grant access to tailor-made recommendations on jobs’ parameters and feedback to the users prior to job submission.

This talk will present the framework current capabilities and illustrate how to identify problematic behaviors and possible solutions based on real use-cases.

Sponsors

Adamantium

Platinum

Location

  • Parc Tecnològic
  • Marie Curie, 8 08042 Barcelona
  • +34 931640488
  • hpckp@hpcnow.com

Contact us

We Would Love to Hear From You

I accept the terms and conditions