Talks > 21-22/06/2018 Benjamin Depardon

Using machine learning to predict and analyze jobs’ behavior

Cluster logs contain historical data that relates job submission parameters to the job execution time, final state, consumed memory… We apply machine-learning techniques to unveil information hidden in the logs and predict jobs’ behavior prior to submission, to reduce waste of resources and improve the efficiency of the cluster.

In this talk we’ll present two tools that allow to understand and predict the behavior of jobs on clusters:

1. Predict-IT: Predict jobs’ behavior in order to enforce that submitted jobs will end up correctly – this increases cluster production and profitability

2. Analyze-IT: Understand cluster behavior in order to find ways to improve its efficiency

Related Talks

Visit our forum

One of the main goals of this project is to motivate new initiatives and collaborations in the HPC field. Visit our forum to share your knowledge and discuss with other HPC experts!

About us

HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.

Promo HPCNow