The massive use of simulation techniques in chemical research generates huge amounts of information, which starts to become recognized as the BigData problem1. The main obstacle for managing big information volumes is its storage in such a way that facilitates data mining as a strategy to optimize the processes that enable scientists to face the challenges of the new sustainable society based on the knowledge and the rational use of existent resources.
The present project aims at creating a platform of services in the cloud to manage computational chemistry. As other related projects2, the concepts underlying our platform rely on well defined standards3 and it implements treatment, hierarchical storage and data recovery tools to facilitate data mining of the Theoretical and Computational Chemistry's BigData. Its main goal is the creation of new methodological strategies that promote an optimal reuse of results and accumulated knowledge and enhances daily researchers’ productivity.
This proposal4 automatizes relevant data extracting processes and transforms numerical data into labelled data in a database. This platform provides tools for the researcher in order to validate, enrich, publish and share information, and tools in the cloud to access and visualize data. Other tools permit creation of reaction energy profile plots by combining data of a set of molecular entities, or automatic creation of Supporting Information files, for instance. The final goal is to build a new reference tool in computational chemistry research, bibliography management and services to third parties. Potential users include computational chemistry research groups worldwide, university libraries and related services, and high performance supercomputer centers.
- Lynch, C. Big data: How do your data grow? Nature, 2008, 455, 28
- Chen, M.; Stott, A. C.; Li, S.; Dixon, D. A. Construction of a robust, large-scale, collaborative database for raw data in computational chemistry: The Collaborative Chemistry Database Tool (CCDBT). J. Mol. Graph. Model., 2012, 34, 67-75
- Adams, S.; de Castro, P.; Echenique, P.; Estrada, J.; Hanwell, M. D.; Murray-Rust, P.; Sherwood, P.; Thomas, J.; Townsend, J. The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age. J. Cheminformatics, 2011, 3, 38.
- Álvarez-Moreno, M.; de Graaf; López, N.; Maseras, F.; Poblet, J. M. and Bo, C. J. Chem. Inf. Model., DOI: 10.1021/ci500593j
Title: Taming the Big Data in Computational Chemistry
Speaker: Carles Bo (ICIQ)
Date: 4th February 2015
Location: Faculty of Chemistry (UB), C/ Martí i Franqués 1, 08028 Barcelona, Spain.