Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

被引:1963
作者
Huerta-Cepas, Jaime [1 ]
Forslund, Kristoffer [1 ]
Coelho, Luis Pedro [1 ]
Szklarczyk, Damian [2 ,3 ]
Jensen, Lars Juhl [4 ]
von Mering, Christian [2 ,3 ]
Bork, Peer [1 ,5 ,6 ,7 ,8 ]
机构
[1] European Mol Biol Lab, Struct & Computat Biol Unit, Heidelberg, Germany
[2] Univ Zurich, Inst Mol Life Sci, Zurich, Switzerland
[3] Swiss Inst Bioinformat, Bioinformat Syst Biol Grp, Zurich, Switzerland
[4] Univ Copenhagen, Fac Hlth & Med Sci, Novo Nordisk Fdn Ctr Prot Res, Copenhagen, Denmark
[5] Univ Heidelberg Hosp, Germany Mol Med Partnership Unit MMPU, Heidelberg, Germany
[6] European Mol Biol Lab, Heidelberg, Germany
[7] Max Delbruck Ctr Mol Med, Berlin, Germany
[8] Univ Wurzburg, Bioctr, Dept Bioinformat, Wurzburg, Germany
基金
欧洲研究理事会;
关键词
orthology; functional annotation; genomics; comparative genomics; gene function; metagenomics;
D O I
10.1093/molbev/msx148
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs similar to 15x faster than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.
引用
收藏
页码:2115 / 2122
页数:8
相关论文
共 22 条
[1]  
Altenhoff AM, 2016, NAT METHODS, V13, P425, DOI [10.1038/NMETH.3830, 10.1038/nmeth.3830]
[2]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60
[3]   Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation [J].
Burge, Sarah ;
Kelly, Elizabeth ;
Lonsdale, David ;
Mutowo-Muellenet, Prudence ;
McAnulla, Craig ;
Mitchell, Alex ;
Sangrador-Vegas, Amaia ;
Yong, Siew-Yit ;
Mulder, Nicola ;
Hunter, Sarah .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012, :bar068
[4]   Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development [J].
Deegan , Jennifer I. ;
Dimmer, Emily C. ;
Mungall, Christopher J. .
BMC BIOINFORMATICS, 2010, 11 :530
[5]   Accelerated Profile HMM Searches [J].
Eddy, Sean R. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
[6]   Pfam: the protein families database [J].
Finn, Robert D. ;
Bateman, Alex ;
Clements, Jody ;
Coggill, Penelope ;
Eberhardt, Ruth Y. ;
Eddy, Sean R. ;
Heger, Andreas ;
Hetherington, Kirstie ;
Holm, Liisa ;
Mistry, Jaina ;
Sonnhammer, Erik L. L. ;
Tate, John ;
Punta, Marco .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D222-D230
[7]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[8]   Functional and evolutionary implications of gene orthology [J].
Gabaldon, Toni ;
Koonin, Eugene V. .
NATURE REVIEWS GENETICS, 2013, 14 (05) :360-366
[9]   Expanded microbial genome coverage and improved protein family annotation in the COG database [J].
Galperin, Michael Y. ;
Makarova, Kira S. ;
Wolf, Yuri I. ;
Koonin, Eugene V. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D261-D269
[10]  
Gene Ontology Consortium, 2015, NUCLEIC ACIDS RES, V43, pD1049