Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems

被引:5
作者
Blumberg, Kai L. [1 ]
Ponsero, Alise J. [1 ]
Bomhoff, Matthew [1 ]
Wood-Charlson, Elisha M. [2 ]
DeLong, Edward F. [3 ]
Hurwitz, Bonnie L. [1 ,4 ]
机构
[1] Univ Arizona, Dept Biosyst Engn, Tucson, AZ 85721 USA
[2] EO Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA USA
[3] Univ Hawaii, Daniel K Inouye Ctr Microbial Oceanog, Honolulu, HI 96822 USA
[4] Univ Arizona, BIO5 Inst, Tucson, AZ 85721 USA
基金
美国国家科学基金会;
关键词
ontology; FAIR; metagenomics; marine microbiology; cyberinfrastructure (CI); next generation sequencing-NGS; omics; PROCHLOROCOCCUS ECOTYPES; EBI METAGENOMICS; ARCHAEAL;
D O I
10.3389/fmicb.2021.765268
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven't been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making 'omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended.
引用
收藏
页数:15
相关论文
共 70 条
[1]   REDFIELD RATIOS OF REMINERALIZATION DETERMINED BY NUTRIENT DATA-ANALYSIS [J].
ANDERSON, LA ;
SARMIENTO, JL .
GLOBAL BIOGEOCHEMICAL CYCLES, 1994, 8 (01) :65-80
[2]  
Anderson MJ, 2001, AUSTRAL ECOL, V26, P32, DOI 10.1111/j.1442-9993.2001.01070.pp.x
[3]  
[Anonymous], 2004, OWL WEB ONTOLOGY LAN
[4]  
[Anonymous], 2008, SPARQL QUERY LANGUAG
[5]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[6]   The Ontology for Biomedical Investigations [J].
Bandrowski, Anita ;
Brinkman, Ryan ;
Brochhausen, Mathias ;
Brush, Matthew H. ;
Bug, Bill ;
Chibucos, Marcus C. ;
Clancy, Kevin ;
Courtot, Melanie ;
Derom, Dirk ;
Dumontier, Michel ;
Fan, Liju ;
Fostel, Jennifer ;
Fragoso, Gilberto ;
Gibson, Frank ;
Gonzalez-Beltran, Alejandra ;
Haendel, Melissa A. ;
He, Yongqun ;
Heiskanen, Mervi ;
Hernandez-Boussard, Tina ;
Jensen, Mark ;
Lin, Yu ;
Lister, Allyson L. ;
Lord, Phillip ;
Malone, James ;
Manduchi, Elisabetta ;
McGee, Monnie ;
Morrison, Norman ;
Overton, James A. ;
Parkinson, Helen ;
Peters, Bjoern ;
Rocca-Serra, Philippe ;
Ruttenberg, Alan ;
Sansone, Susanna-Assunta ;
Scheuermann, Richard H. ;
Schober, Daniel ;
Smith, Barry ;
Soldatova, Larisa N. ;
Stoeckert, Christian J., Jr. ;
Taylor, Chris F. ;
Torniai, Carlo ;
Turner, Jessica A. ;
Vita, Randi ;
Whetzel, Patricia L. ;
Zheng, Jie .
PLOS ONE, 2016, 11 (04)
[7]   Data Descriptor: Marine microbial metagenomes sampled across space and time [J].
Biller, Steven J. ;
Berube, Paul M. ;
Dooley, Keven ;
Williams, Madeline ;
Satinsky, Brandon M. ;
Hackl, Thomas ;
Hogle, Shane L. ;
Coe, Allison ;
Bergauer, Kristin ;
Bouman, Heather A. ;
Browning, Thomas J. ;
De Corte, Daniele ;
Hassler, Christel ;
Hulston, Debbie ;
Jacquot, Jeremy E. ;
Maas, Elizabeth W. ;
Reinthaler, Thomas ;
Sintes, Eva ;
Yokokawa, Taichi ;
Chisholm, Sallie W. .
SCIENTIFIC DATA, 2018, 5
[8]   Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus [J].
Biller, Steven J. ;
Berube, Paul M. ;
Berta-Thompson, Jessie W. ;
Kelly, Libusha ;
Roggensack, Sara E. ;
Awad, Lana ;
Roache-Johnson, Kathryn H. ;
Ding, Huiming ;
Giovannoni, Stephen J. ;
Rocap, Gabrielle ;
Moore, Lisa R. ;
Chisholm, Sallie W. .
SCIENTIFIC DATA, 2014, 1
[9]   Seasonal cycles of temperature, salinity and dissolved oxygen observed in the Hawaii Ocean Time-series [J].
Bingham, FM ;
Lukas, R .
DEEP-SEA RESEARCH PART II-TOPICAL STUDIES IN OCEANOGRAPHY, 1996, 43 (2-3) :199-213
[10]  
Brown S. A., 1993, Computers in Physics, V7, P304