Named entity linking of geospatial and host metadata in GenBank for advancing biomedical research

被引:2
作者
Tahsin, Tasnia [1 ]
Weissenbacher, Davy [1 ,2 ]
Jones-Shargani, Demetrius [2 ]
Magee, Daniel [1 ,2 ]
Vaiente, Matteo [1 ,2 ]
Gonzalez, Graciela [1 ,3 ]
Scotch, Matthew [1 ,2 ]
机构
[1] Arizona State Univ, Dept Biomed Informat, 13212 E Shea Blvd, Scottsdale, AZ 85259 USA
[2] Arizona State Univ, Biodesign Ctr Environm Hlth Engn, 781 E,Terrace Mall, Tempe, AZ 85281 USA
[3] Univ Penn, Perelman Sch Med, Inst Biomed Informat, 423 Guardian Dr, Philadelphia, PA 19104 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2017年
基金
美国国家卫生研究院;
关键词
DATABASES; RELIABILITY; SEQUENCES; TEXT; WEB;
D O I
10.1093/database/bax093
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
GenBank is a popular National Center for Biotechnology Information (NCBI) database for submission and analysis of DNA sequences for biomedical research. The resource is part of the Entrez environment which enables for cross-linking of concepts and entries in other participating NCBI databases such as Taxonomy, PubMed and Protein. For example, a GenBank record of an influenza A hemagglutinin gene DNA sequence might have a link to the Taxonomy database for the organism, a link to the related article in PubMed (if published) and a link to the Protein entry for the hemagglutinin protein. Despite its importance in biomedical research such as population genetics, phylogeography and public health surveillance, the host and geospatial metadata of genetic sequences in GenBank are not linked to any database. Therefore, to facilitate biomedical research based on georeferenced DNA sequences and/or DNA sequences with normalized host names, we designed and developed a framework that enriches GenBank entries by linking their host metadata to the NCBI Taxonomy database and their geospatial metadata to a comprehensive knowledge base of geographic locations called GeoNames. Here, we introduce a database created through the application of this framework to virus sequences in GenBank, and evaluate our normalization algorithms on a set of manually annotated records pertaining to viruses. Although currently applied to viruses, our framework can be easily extended to other organisms, and we discuss the potential utilization of our resource for biomedical research.
引用
收藏
页数:16
相关论文
共 35 条
[1]  
[Anonymous], 1993, An introduction to the bootstrap
[2]  
[Anonymous], 2016, P 54 ANN M ASS COMP
[3]  
Bada M, 2014, METHODS MOL BIOL, V1159, P33, DOI 10.1007/978-1-4939-0709-0_3
[4]   BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata [J].
Barrett, Tanya ;
Clark, Karen ;
Gevorgyan, Robert ;
Gorelenkov, Vyacheslav ;
Gribov, Eugene ;
Karsch-Mizrachi, Ilene ;
Kimelman, Michael ;
Pruitt, Kim D. ;
Resenchuk, Sergei ;
Tatusova, Tatiana ;
Yaschenko, Eugene ;
Ostell, James .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D57-D63
[5]  
Benson DA, 2010, NUCLEIC ACIDS RES, V38, pD46, DOI [10.1093/nar/gkp1024, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkw1070, 10.1093/nar/gks1195, 10.1093/nar/gkn723, 10.1093/nar/gkg057, 10.1093/nar/gkr1202, 10.1093/nar/gkq1079]
[6]   Three roads diverged? Routes to phylogeographic inference [J].
Bloomquist, Erik W. ;
Lemey, Philippe ;
Suchard, Marc A. .
TRENDS IN ECOLOGY & EVOLUTION, 2010, 25 (11) :626-632
[7]   Phylogeny of Shiga Toxin-Producing Escherichia coli O157 Isolated from Cattle and Clinically Ill Humans [J].
Bono, James L. ;
Smith, Timothy P. L. ;
Keen, James E. ;
Harhay, Gregory P. ;
McDaneld, Tara G. ;
Mandrell, Robert E. ;
Jung, Woo Kyung ;
Besser, Thomas E. ;
Gerner-Smidt, Peter ;
Bielaszewska, Martina ;
Karch, Helge ;
Clawson, Michael L. .
MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (08) :2047-2062
[8]   NCBI Viral Genomes Resource [J].
Brister, J. Rodney ;
Ako-adjei, Danso ;
Bao, Yiming ;
Blinkova, Olga .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D571-D577
[9]   The environment ontology: contextualising biological and biomedical entities [J].
Buttigieg, Pier Luigi ;
Morrison, Norman ;
Smith, Barry ;
Mungall, Christopher J. ;
Lewis, Suzanna E. .
JOURNAL OF BIOMEDICAL SEMANTICS, 2013, 4
[10]  
Chen Elizabeth S, 2011, AMIA Jt Summits Transl Sci Proc, V2011, P6