Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees

被引:24
作者
Boeckmann, Brigitte [1 ]
Robinson-Rechavi, Marc [2 ]
Xenarios, Ioannis [3 ]
Dessimoz, Christophe [4 ]
机构
[1] SIB Geneva, Swiss Prot Grp, Geneva, Switzerland
[2] Univ Lausanne, Dept Ecol & Evolut, CH-1015 Lausanne, Switzerland
[3] Swiss Inst Bioinformat, Vital IT Grp, Geneva, Switzerland
[4] ETH, CBRG Grp, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
conceptual comparison; phylogenomic databases; quality assessment; reference gene trees; MULTIPLE SEQUENCE ALIGNMENT; ORTHOLOGY; EVOLUTION; PERFORMANCE; ALGORITHMS; CONVERSION; FAMILIES; GENOMICS; VERSION; MODEL;
D O I
10.1093/bib/bbr034
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold standard' phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.
引用
收藏
页码:423 / 435
页数:13
相关论文
共 33 条
[1]  
Alexeyenko Andrey, 2006, Drug Discov Today Technol, V3, P137, DOI 10.1016/j.ddtec.2006.06.002
[2]   OMA 2011: orthology inference among 1000 complete genomes [J].
Altenhoff, Adrian M. ;
Schneider, Adrian ;
Gonnet, Gaston H. ;
Dessimoz, Christophe .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D289-D294
[3]   Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods [J].
Altenhoff, Adrian M. ;
Dessimoz, Christophe .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (01)
[4]   Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative [J].
Anisimova, Maria ;
Gascuel, Olivier .
SYSTEMATIC BIOLOGY, 2006, 55 (04) :539-552
[5]  
[Anonymous], NUCL ACIDS RES
[6]   Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes [J].
Chen, Feng ;
Mackey, Aaron J. ;
Vermunt, Jeroen K. ;
Roos, David S. .
PLOS ONE, 2007, 2 (04)
[7]   Gene conversion: mechanisms, evolution and human disease [J].
Chen, Jian-Min ;
Cooper, David N. ;
Chuzhanova, Nadia ;
Ferec, Claude ;
Patrinos, George P. .
NATURE REVIEWS GENETICS, 2007, 8 (10) :762-775
[8]   Phylogeny.fr: robust phylogenetic analysis for the non-specialist [J].
Dereeper, A. ;
Guignon, V. ;
Blanc, G. ;
Audic, S. ;
Buffet, S. ;
Chevenet, F. ;
Dufayard, J. -F. ;
Guindon, S. ;
Lefort, V. ;
Lescot, M. ;
Claverie, J. -M. ;
Gascuel, O. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :W465-W469
[9]   Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits [J].
Dessimoz, Christophe ;
Boeckmann, Brigitte ;
Roth, Alexander C. J. ;
Gonnet, Gaston H. .
NUCLEIC ACIDS RESEARCH, 2006, 34 (11) :3309-3316
[10]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340