Understanding the performance of a parallel application can be a difficult and time-consuming task. The Paraver tool can provide performance insight of an application and allow to identify its bottlenecks. However, once detected which functions have a less than expected performance, the task to figure out which parts of the code are producing the performance downgrade, specially in routines of thousands of lines, can consume a lot of time. In this work we present our methodology based on using clustering and folding tools on Nucleus for European Modelling of the Ocean (NEMO) model, which is known for its computational problems. This is the first study about NEMO using this approach.
These tools are provided from the Computer Sciences department of Barcelona Supercomputing Center in order to identify the parts of the code that should be improved to increase application performance. We apply first the clustering tool to group the computation phases between MPI calls with similar properties into clusters and then with the folding tool we can understand the internals of each cluster and correlate with the lines of code using the Paraver tool. Finally, with the combination of various hardware counters the reason of the low performance is discovered.