The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

被引:250
作者
Wilke, Andreas [1 ,2 ]
Harrison, Travis [1 ,2 ]
Wilkening, Jared [1 ,5 ]
Field, Dawn [3 ]
Glass, Elizabeth M. [1 ,2 ]
Kyrpides, Nikos [4 ]
Mavrommatis, Konstantinos [4 ]
Meyer, Folker [1 ,2 ,5 ]
机构
[1] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA
[2] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[3] Ctr Ecol & Hydrol, Wallingford, Oxon, England
[4] Dept Energy Joint Genome Inst, Walnut Creek, CA USA
[5] Inst Genom & Syst Biol, Chicago, IL 60637 USA
关键词
IDENTIFIERS; GENOMES; BLAST; KEGG;
D O I
10.1186/1471-2105-13-141
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. Description: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. Conclusions: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.
引用
收藏
页数:5
相关论文
共 20 条
[11]   BioThesaurus: a web-based thesaurus of protein and gene names [J].
Liu, HF ;
Hu, ZZ ;
Zhang, J ;
Wu, C .
BIOINFORMATICS, 2006, 22 (01) :103-105
[12]   The integrated microbial genomes (IMG) system [J].
Markowitz, Victor M. ;
Korzeniewski, Frank ;
Palaniappan, Krishna ;
Szeto, Ernest ;
Werner, Greg ;
Padki, Anu ;
Zhao, Xueling ;
Dubchak, Inna ;
Hugenholtz, Philip ;
Anderson, Iain ;
Lykidis, Athanasios ;
Mavromatis, Konstantinos ;
Ivanova, Natalia ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D344-D348
[13]   The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes [J].
Meyer, F. ;
Paarmann, D. ;
D'Souza, M. ;
Olson, R. ;
Glass, E. M. ;
Kubal, M. ;
Paczian, T. ;
Rodriguez, A. ;
Stevens, R. ;
Wilke, A. ;
Wilkening, J. ;
Edwards, R. A. .
BMC BIOINFORMATICS, 2008, 9 (1)
[14]   FIGfams: yet another set of protein families [J].
Meyer, Folker ;
Overbeek, Ross ;
Rodriguez, Alex .
NUCLEIC ACIDS RESEARCH, 2009, 37 (20) :6643-6654
[15]   eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations [J].
Muller, J. ;
Szklarczyk, D. ;
Julien, P. ;
Letunic, I. ;
Roth, A. ;
Kuhn, M. ;
Powell, S. ;
von Mering, C. ;
Doerks, T. ;
Jensen, L. J. ;
Bork, P. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D190-D195
[16]   The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes [J].
Overbeek, R ;
Begley, T ;
Butler, RM ;
Choudhuri, JV ;
Chuang, HY ;
Cohoon, M ;
de Crécy-Lagard, V ;
Diaz, N ;
Disz, T ;
Edwards, R ;
Fonstein, M ;
Frank, ED ;
Gerdes, S ;
Glass, EM ;
Goesmann, A ;
Hanson, A ;
Iwata-Reuyl, D ;
Jensen, R ;
Jamshidi, N ;
Krause, L ;
Kubal, M ;
Larsen, N ;
Linke, B ;
McHardy, AC ;
Meyer, F ;
Neuweger, H ;
Olsen, G ;
Olson, R ;
Osterman, A ;
Portnoy, V ;
Pusch, GD ;
Rodionov, DA ;
Rückert, C ;
Steiner, J ;
Stevens, R ;
Thiele, I ;
Vassieva, O ;
Ye, Y ;
Zagnitko, O ;
Vonstein, V .
NUCLEIC ACIDS RESEARCH, 2005, 33 (17) :5691-5702
[17]   Single-molecule sequencing of an individual human genome [J].
Pushkarev, Dmitry ;
Neff, Norma F. ;
Quake, Stephen R. .
NATURE BIOTECHNOLOGY, 2009, 27 (09) :847-U101
[18]  
Rivest R, 1992, The MD5 message-digest algorithm, DOI DOI 10.17487/RFC1321
[19]   MagicMatch - cross-referencing sequence identifiers across databases [J].
Smith, M ;
Kunin, V ;
Goldovsky, L ;
Enright, AJ ;
Ouzounis, CA .
BIOINFORMATICS, 2005, 21 (16) :3429-3430
[20]   The COG database: an updated version includes eukaryotes [J].
Tatusov, RL ;
Fedorova, ND ;
Jackson, JD ;
Jacobs, AR ;
Kiryutin, B ;
Koonin, EV ;
Krylov, DM ;
Mazumder, R ;
Mekhedov, SL ;
Nikolskaya, AN ;
Rao, BS ;
Smirnov, S ;
Sverdlov, AV ;
Vasudevan, S ;
Wolf, YI ;
Yin, JJ ;
Natale, DA .
BMC BIOINFORMATICS, 2003, 4 (1)