Talks > 15-16/10/2012 Joan Protasio

Analysis and improvement of mapreduce data distribution in read mapping applications

The map-reduce paradigm has shown to be a simple and feasible way of filtering and analyzing large data sets in cloud and cluster systems. Algorithms designed for the paradigm must implement regular data distribution patterns so that appropriate use of resources is ensured. Good scalability and performance on Map-Reduce applications greatly depend on the design of regular intermediate data generation consumption patterns at the map and reduce phases. We describe the data distribution patterns found in current Map-Reduce read mapping bioinformatics applications and show some data decomposition principles to greatly improve their scalability and performance.


Related Talks

Visit our forum

One of the main goals of this project is to motivate new initiatives and collaborations in the HPC field. Visit our forum to share your knowledge and discuss with other HPC experts!

About us

HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.

Promo HPCNow