SuperFine: Fast and Accurate Supertree Estimation

被引:34
作者
Swenson, M. Shel [1 ]
Suri, Rahul [1 ]
Linder, C. Randal [2 ]
Warnow, Tandy [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Texas Austin, Sch Biol Sci, Sect Integrat Biol, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Algorithms; maximum likelihood; MRP; phylogenetics; simulation; supertrees; MATRIX REPRESENTATION; PHYLOGENETIC INFERENCE; TREES; PARSIMONY; SUPPORT;
D O I
10.1093/sysbio/syr092
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.
引用
收藏
页码:214 / 227
页数:14
相关论文
共 49 条
[1]  
[Anonymous], 2004, Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life
[2]  
[Anonymous], 2002, PAUP*. Phylogenetic Analysis Using Parsimony (*and other methods). Version 4
[3]   Robinson-Foulds Supertrees [J].
Bansal, Mukul S. ;
Burleigh, J. Gordon ;
Eulenstein, Oliver ;
Fernandez-Baca, David .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5
[4]  
Baum B.R., 2004, COMPU BIOL, P17
[6]   A higher-level MRP supertree of placental mammals [J].
Beck, Robin M. D. ;
Bininda-Emonds, Olaf R. P. ;
Cardillo, Marcel ;
Liu, Fu-Guo Robert ;
Purvis, Andy .
BMC EVOLUTIONARY BIOLOGY, 2006, 6 (1)
[7]   Novel versus unsupported clades: Assessing the qualitative support for clades in MRP supertrees [J].
Bininda-Emonds, ORP .
SYSTEMATIC BIOLOGY, 2003, 52 (06) :839-848
[8]  
Bininda-Emonds ORP, 1998, SYST BIOL, V47, P497
[9]  
Burleigh J.G., 2004, COMPU BIOL, P65
[10]   A species-level phylogenetic supertree of marsupials [J].
Cardillo, M ;
Bininda-Emonds, ORP ;
Boakes, E ;
Purvis, A .
JOURNAL OF ZOOLOGY, 2004, 264 :11-31