Data model, dictionaries, and desiderata for biomolecular simulation data indexing and sharing

被引:12
作者
Thibault, Julien C. [1 ]
Roe, Daniel R. [2 ]
Facelli, Julio C. [1 ]
Cheatham, Thomas E., III [2 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT 84112 USA
[2] Univ Utah, Dept Med Chem, Salt Lake City, UT 84112 USA
基金
美国国家科学基金会;
关键词
Biomolecular simulations; Molecular dynamics; Computational chemistry; Data model; Repository; XML; UML; CHEMICAL MARKUP; DYNAMICS; LANGUAGE; IMPLEMENTATION; SEMANTICS; SOFTWARE; DESIGN; NWCHEM; SYSTEM; WEB;
D O I
10.1186/1758-2946-6-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Few environments have been developed or deployed to widely share biomolecular simulation data or to enable collaborative networks to facilitate data exploration and reuse. As the amount and complexity of data generated by these simulations is dramatically increasing and the methods are being more widely applied, the need for new tools to manage and share this data has become obvious. In this paper we present the results of a process aimed at assessing the needs of the community for data representation standards to guide the implementation of future repositories for biomolecular simulations. Results: We introduce a list of common data elements, inspired by previous work, and updated according to feedback from the community collected through a survey and personal interviews. These data elements integrate the concepts for multiple types of computational methods, including quantum chemistry and molecular dynamics. The identified core data elements were organized into a logical model to guide the design of new databases and application programming interfaces. Finally a set of dictionaries was implemented to be used via SQL queries or locally via a Java API built upon the Apache Lucene text-search engine. Conclusions: The model and its associated dictionaries provide a simple yet rich representation of the concepts related to biomolecular simulations, which should guide future developments of repositories and more complex terminologies and ontologies. The model still remains extensible through the decomposition of virtual experiments into tasks and parameter sets, and via the use of extended attributes. The benefits of a common logical model for biomolecular simulations was illustrated through various use cases, including data storage, indexing, and presentation. All the models and dictionaries introduced in this paper are available for download at http://ibiomes.chpc.utah.edu/mediawiki/index.php/Downloads.
引用
收藏
页数:23
相关论文
共 45 条
[1]  
Abouzied A., 2010, SIGMOD C, P1111
[2]   The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age [J].
Adams, Sam ;
de Castro, Pablo ;
Echenique, Pablo ;
Estrada, Jorge ;
Hanwell, Marcus D. ;
Murray-Rust, Peter ;
Sherwood, Paul ;
Thomas, Jens ;
Townsend, Joe .
JOURNAL OF CHEMINFORMATICS, 2011, 3
[3]  
Alan M., 2006, CHEM INT, V1, P12, DOI DOI 10.1515/CI.2006.28.6.12
[4]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[5]  
[Anonymous], SCALALIFE SCALABLE S
[6]  
[Anonymous], PRACTICAL GUIDE LOGI
[7]  
[Anonymous], 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services, DOI [10.2200/S00233ED1V01Y200912ICR012, DOI 10.2200/S00233ED1V01Y200912ICR012]
[8]  
Bernstein F.C., 2008, Eur. J. Biochem, V80, P319
[9]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[10]   The Amber biomolecular simulation programs [J].
Case, DA ;
Cheatham, TE ;
Darden, T ;
Gohlke, H ;
Luo, R ;
Merz, KM ;
Onufriev, A ;
Simmerling, C ;
Wang, B ;
Woods, RJ .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2005, 26 (16) :1668-1688