Reproducibility has become one of the most pressing issues in biology and many other computational-based research fields. This impasse has been fuelled by the combined reliance on increasingly complex data analysis methods and the exponential growth of big data. When considering the installation, deployment, and maintenance of computational data-analysis pipelines, an even more challenging picture emerges due to the lack of community standards. Moreover, the effect of limited standards on reproducibility is amplified by the very diverse range of computational platforms and configurations on which these applications are expected to be applied (workstations, clusters, HPC, clouds, etc.).
Software containers are gaining consensus as a solution to the problem of reproducibility of computational workflows. However, the orchestration of large containerised workloads at scale and in a portable manner across different platforms and runtime pose new challenges.
This presentation will give an introduction of Nextflow, a pipeline orchestration tool that has been designed to address exactly these issues. Nextflow is a computational environment which provides a domain specific language (DSL), meant to simplify the implementation and the deployment of complex large-scale containerised workloads in a portable and replicable manner. It allows the seamless parallelization and deployment of any existing application with minimal development and maintenance overhead, irrespective of the original programming language.