Many scientific fields have become highly data-driven with the development of computer sciences. For instance, astronomy, meteorology, social computing, bioinformatics are greatly based on data-intensive scientific discovery as large volume of data with various types generated or produced in these science fields. How to probe knowledge from the data produced by large-scale scientific simulation? It is a certain data-intensive problem. One common point exists in these disciplines is that they generate enormous data sets that automated analysis is highly required, which is a demanding data-intensive stage in many scientific methods. There is commensurate growth in expectations about what can be achieved with this wealth of data and computational power.
To meet these expectations with available expertise requires new frameworks that make it easier to reliably formalise data-driven methods that exploit high-end architectures to meet the needs of science, industry and society. In this work we present a new data-driven framework, called Asterism, which aims to simplify the effort required to develop data-intensive applications that run across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment systems; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data