The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

被引:250
作者
Wilke, Andreas [1 ,2 ]
Harrison, Travis [1 ,2 ]
Wilkening, Jared [1 ,5 ]
Field, Dawn [3 ]
Glass, Elizabeth M. [1 ,2 ]
Kyrpides, Nikos [4 ]
Mavrommatis, Konstantinos [4 ]
Meyer, Folker [1 ,2 ,5 ]
机构
[1] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA
[2] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[3] Ctr Ecol & Hydrol, Wallingford, Oxon, England
[4] Dept Energy Joint Genome Inst, Walnut Creek, CA USA
[5] Inst Genom & Syst Biol, Chicago, IL 60637 USA
关键词
IDENTIFIERS; GENOMES; BLAST; KEGG;
D O I
10.1186/1471-2105-13-141
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. Description: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. Conclusions: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.
引用
收藏
页数:5
相关论文
共 20 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing [J].
Angiuoli, Samuel V. ;
Matalka, Malcolm ;
Gussman, Aaron ;
Galens, Kevin ;
Vangala, Mahesh ;
Riley, David R. ;
Arze, Cesar ;
White, James R. ;
White, Owen ;
Fricke, W. Florian .
BMC BIOINFORMATICS, 2011, 12
[3]  
[Anonymous], IEEE CLUSTER
[4]   A database of unique protein sequence identifiers for proteome studies [J].
Babnigg, Gyorgy ;
Giometti, Carol S. .
PROTEOMICS, 2006, 6 (16) :4514-4522
[5]  
*COMM MET CHALL FU, 2007, NEW SCI MET REV SECR
[6]   The Protein Identifier Cross-Referencing (PICR) service:: reconciling protein identifiers across multiple source databases [J].
Cote, Richard G. ;
Jones, Philip ;
Martens, Lennart ;
Kerrien, Samuel ;
Reisinger, Florian ;
Lin, Quan ;
Leinonen, Rasko ;
Apweiler, Rolf ;
Hermjakob, Henning .
BMC BIOINFORMATICS, 2007, 8 (1) :401
[7]   The Gene Ontology (GO) database and informatics resource [J].
Harris, MA ;
Clark, J ;
Ireland, A ;
Lomax, J ;
Ashburner, M ;
Foulger, R ;
Eilbeck, K ;
Lewis, S ;
Marshall, B ;
Mungall, C ;
Richter, J ;
Rubin, GM ;
Blake, JA ;
Bult, C ;
Dolan, M ;
Drabkin, H ;
Eppig, JT ;
Hill, DP ;
Ni, L ;
Ringwald, M ;
Balakrishnan, R ;
Cherry, JM ;
Christie, KR ;
Costanzo, MC ;
Dwight, SS ;
Engel, S ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Nash, RS ;
Sethuraman, A ;
Theesfeld, CL ;
Botstein, D ;
Dolinski, K ;
Feierbach, B ;
Berardini, T ;
Mundodi, S ;
Rhee, SY ;
Apweiler, R ;
Barrell, D ;
Camon, E ;
Dimmer, E ;
Lee, V ;
Chisholm, R ;
Gaudet, P ;
Kibbe, W ;
Kishore, R ;
Schwarz, EM ;
Sternberg, P ;
Gwinn, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D258-D261
[8]  
Kanehisa M, 2002, NOVART FDN SYMP, V247, P91
[9]   KEGG for linking genomes to life and the environment [J].
Kanehisa, Minoru ;
Araki, Michihiro ;
Goto, Susumu ;
Hattori, Masahiro ;
Hirakawa, Mika ;
Itoh, Masumi ;
Katayama, Toshiaki ;
Kawashima, Shuichi ;
Okuda, Shujiro ;
Tokimatsu, Toshiaki ;
Yamanishi, Yoshihiro .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D480-D484
[10]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]