Cluster logs contain historical data that relates job submission parameters to the job execution time, final state, consumed memory… We apply machine-learning techniques to unveil information hidden in the logs and predict jobs’ behavior prior to submission, to reduce waste of resources and improve the efficiency of the cluster.
In this talk we’ll present two tools that allow to understand and predict the behavior of jobs on clusters:
1. Predict-IT: Predict jobs’ behavior in order to enforce that submitted jobs will end up correctly – this increases cluster production and profitability
2. Analyze-IT: Understand cluster behavior in order to find ways to improve its efficiency