EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

被引:892
作者
Vilella, Albert J. [1 ]
Severin, Jessica [1 ]
Ureta-Vidal, Abel [1 ]
Heng, Li [2 ]
Durbin, Richard [2 ]
Birney, Ewan [1 ]
机构
[1] EMBL EBI, Cambridge CB10 1SD, England
[2] Wellcome Trust Sanger Inst, Cambridge CB10 1HH, England
基金
英国惠康基金;
关键词
MAXIMUM-LIKELIHOOD; GENOME SEQUENCE; DATABASE; EVOLUTION; INSIGHTS; ALGORITHM; ORTHOLOGS; FAMILIES; PARALOGS;
D O I
10.1101/gr.073585.107
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.
引用
收藏
页码:327 / 335
页数:9
相关论文
共 27 条
[21]   NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins [J].
Pruitt, Kim D. ;
Tatusova, Tatiana ;
Maglott, Donna R. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D61-D65
[22]   Automatic clustering of orthologs and in-paralogs from pairwise species comparisons [J].
Remm, M ;
Storm, CEV ;
Sonnhammer, ELL .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 314 (05) :1041-1052
[23]   The bioperl toolkit:: Perl modules for the life sciences [J].
Stajich, JE ;
Block, D ;
Boulez, K ;
Brenner, SE ;
Chervitz, SA ;
Dagdigian, C ;
Fuellen, G ;
Gilbert, JGR ;
Korf, I ;
Lapp, H ;
Lehväslaiho, H ;
Matsalla, C ;
Mungall, CJ ;
Osborne, BI ;
Pocock, MR ;
Schattner, P ;
Senger, M ;
Stein, LD ;
Stupka, E ;
Wilkinson, MD ;
Birney, E .
GENOME RESEARCH, 2002, 12 (10) :1611-1618
[24]   Initial sequencing and comparative analysis of the mouse genome [J].
Waterston, RH ;
Lindblad-Toh, K ;
Birney, E ;
Rogers, J ;
Abril, JF ;
Agarwal, P ;
Agarwala, R ;
Ainscough, R ;
Alexandersson, M ;
An, P ;
Antonarakis, SE ;
Attwood, J ;
Baertsch, R ;
Bailey, J ;
Barlow, K ;
Beck, S ;
Berry, E ;
Birren, B ;
Bloom, T ;
Bork, P ;
Botcherby, M ;
Bray, N ;
Brent, MR ;
Brown, DG ;
Brown, SD ;
Bult, C ;
Burton, J ;
Butler, J ;
Campbell, RD ;
Carninci, P ;
Cawley, S ;
Chiaromonte, F ;
Chinwalla, AT ;
Church, DM ;
Clamp, M ;
Clee, C ;
Collins, FS ;
Cook, LL ;
Copley, RR ;
Coulson, A ;
Couronne, O ;
Cuff, J ;
Curwen, V ;
Cutts, T ;
Daly, M ;
David, R ;
Davies, J ;
Delehaunty, KD ;
Deri, J ;
Dermitzakis, ET .
NATURE, 2002, 420 (6915) :520-562
[25]   Database resources of the national center for biotechnology information [J].
Wheeler, David L. ;
Barrett, Tanya ;
Benson, Dennis A. ;
Bryant, Stephen H. ;
Canese, Kathi ;
Chetvernin, Vyacheslav ;
Church, Deanna M. ;
DiCuccio, Michael ;
Edgar, Ron ;
Federhen, Scott ;
Feolo, Michael ;
Geer, Lewis Y. ;
Helmberg, Wolfgang ;
Kapustin, Yuri ;
Khovayko, Oleg ;
Landsman, David ;
Lipman, David J. ;
Madden, Thomas L. ;
Maglott, Donna R. ;
Miller, Vadim ;
Ostell, James ;
Pruitt, Kim D. ;
Schuler, Gregory D. ;
Shumway, Martin ;
Sequeira, Edwin ;
Sherry, Steven T. ;
Sirotkin, Karl ;
Souvorov, Alexandre ;
Starchenko, Grigory ;
Tatusov, Roman L. ;
Tatusova, Tatiana A. ;
Wagner, Lukas ;
Yaschenko, Eugene .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D13-D21
[26]   Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals [J].
Xie, XH ;
Lu, J ;
Kulbokas, EJ ;
Golub, TR ;
Mootha, V ;
Lindblad-Toh, K ;
Lander, ES ;
Kellis, M .
NATURE, 2005, 434 (7031) :338-345
[27]   PAML 4: Phylogenetic analysis by maximum likelihood [J].
Yang, Ziheng .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (08) :1586-1591