Methods for Specifying Scientific Data Standards and Modeling Relationships with Applications to Neuroscience

被引:9
作者
Rulbel, Oliver [1 ]
Dougherty, Max [2 ]
Prabhat [3 ]
Denes, Peter [4 ]
Conant, David [5 ]
Chang, Edward F. [5 ]
Bouchard, Kristofer [2 ]
机构
[1] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Biol Syst & Engn Div, Berkeley, CA 94720 USA
[3] Lawrence Berkeley Natl Lab, Natl Energy Res Sci Comp Ctr, Berkeley, CA USA
[4] Lawrence Berkeley Natl Lab, Div Phys Sci, Berkeley, CA USA
[5] Univ Calif San Francisco, Med Ctr, Neurosci, San Francisco, CA USA
关键词
data format specification; relationship modeling; electrophysiology; neuroscience; FORMAT;
D O I
10.3389/fninf.2016.00048
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Neuroscience continues to experience a tremendous growth in data; in terms of the volume and variety of data, the velocity at which data is acquired, and in turn the veracity of data. These challenges are a serious impediment to sharing of data, analyses, and tools within and across labs. Here, we introduce BRAINformat, a novel data standardization framework for the design and management of scientific data formats. The BRAINformat library defines application independent design concepts and modules that together create a general framework for standardization of scientific data. We describe the formal specification of scientific data standards, which facilitates sharing and verification of data and formats. We introduce the concept of Managed Objects, enabling semantic components of data formats to be specified as self-contained units, supporting modular and reusable design of data format components and file storage. We also introduce the novel concept of Relationship Attributes for modeling and use of semantic relationships between data objects. Based on these concepts we demonstrate the application of our framework to design and implement a standard format for electrophysiology data and show how data standardization and relationship-modeling facilitate data analysis and sharing. The format uses HDF5, enabling portable, scalable, and self-describing data storage and integration with modern high-performance computing for data-driven discovery. The BRAINformat library is open source, easy-to-use, and provides detailed user and developer documentation and is freely available at: https://bitbucket.org/oruebel/brainformat.
引用
收藏
页数:16
相关论文
共 19 条
[1]  
[Anonymous], 1997, HIER DAT FORM VERS 5
[2]  
Bray T., 2008, Extensible Markup Language (XML) 1.0, VFifth
[3]  
Clarke JA, 2007, PROCEEDINGS OF THE HPCMP USERS GROUP CONFERENCE 2007, P322
[4]   NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail [J].
Gleeson, Padraig ;
Crook, Sharon ;
Cannon, Robert C. ;
Hines, Michael L. ;
Billings, Guy O. ;
Farinella, Matteo ;
Morse, Thomas M. ;
Davison, Andrew P. ;
Ray, Subhasis ;
Bhalla, Upinder S. ;
Barnes, Simon R. ;
Dimitrova, Yoana D. ;
Silver, R. Angus .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (06) :1-19
[5]  
Godfrey Keith, 2014, NEURODATA BORDERS HA
[6]  
Grewe Jan, 2011, Front Neuroinform, V5, P16, DOI 10.3389/fninf.2011.00016
[7]  
Habermann T., 2014, AGU FALL M ABSTRACTS
[8]  
JSON, 1999, JSON JAVASCRIPT OBJ
[9]  
Kadir S. N., 2013, ARXIV13092848QBIOQM
[10]  
Kadir S. N., 2013, KLUSTAKWIK