RAxML and FastTree: Comparing Two Methods for Large-Scale Maximum Likelihood Phylogeny Estimation

被引:147
作者
Liu, Kevin [1 ]
Linder, C. Randal [2 ]
Warnow, Tandy [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Texas Austin, Sect Integrat Biol, Sch Biol Sci, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
ALGORITHM; TREES;
D O I
10.1371/journal.pone.0027731
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Statistical methods for phylogeny estimation, especially maximum likelihood (ML), offer high accuracy with excellent theoretical properties. However, RAxML, the current leading method for large-scale ML estimation, can require weeks or longer when used on datasets with thousands of molecular sequences. Faster methods for ML estimation, among them FastTree, have also been developed, but their relative performance to RAxML is not yet fully understood. In this study, we explore the performance with respect to ML score, running time, and topological accuracy, of FastTree and RAxML on thousands of alignments (based on both simulated and biological nucleotide datasets) with up to 27,634 sequences. We find that when RAxML and FastTree are constrained to the same running time, FastTree produces topologically much more accurate trees in almost all cases. We also find that when RAxML is allowed to run to completion, it provides an advantage over FastTree in terms of the ML score, but does not produce substantially more accurate tree topologies. Interestingly, the relative accuracy of trees computed using FastTree and RAxML depends in part on the accuracy of the sequence alignment and dataset size, so that FastTree can be more accurate than RAxML on large datasets with relatively inaccurate alignments. Finally, the running times of RAxML and FastTree are dramatically different, so that when run to completion, RAxML can take several orders of magnitude longer than FastTree to complete. Thus, our study shows that very large phylogenies can be estimated very quickly using FastTree, with little (and in some cases no) degradation in tree accuracy, as compared to RAxML.
引用
收藏
页数:11
相关论文
共 19 条
[11]   In Vivo Analysis of Dendritic Cell Development and Homeostasis [J].
Liu, Kang ;
Victora, Gabriel D. ;
Schwickert, Tanja A. ;
Guermonprez, Pierre ;
Meredith, Matthew M. ;
Yao, Kaihui ;
Chu, Fei-Fan ;
Randolph, Gwendalyn J. ;
Rudensky, Alexander Y. ;
Nussenzweig, Michel .
SCIENCE, 2009, 324 (5925) :392-397
[12]   Alpha-Adducin Gly460Trp Polymorphism and Hypertension Risk: A Meta-Analysis of 22 Studies Including 14303 Cases and 15961 Controls [J].
Liu, Kuo ;
Liu, Jielin ;
Huang, Yan ;
Liu, Ya ;
Lou, Yuqing ;
Wang, Zuoguang ;
Zhang, Hong ;
Yan, Shan ;
Li, Zhizhong ;
Wen, Shaojun .
PLOS ONE, 2010, 5 (09)
[13]   FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments [J].
Price, Morgan N. ;
Dehal, Paramvir S. ;
Arkin, Adam P. .
PLOS ONE, 2010, 5 (03)
[14]   FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix [J].
Price, Morgan N. ;
Dehal, Paramvir S. ;
Arkin, Adam P. .
MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (07) :1641-1650
[15]   A Rapid Bootstrap Algorithm for the RAxML Web Servers [J].
Stamatakis, Alexandros ;
Hoover, Paul ;
Rougemont, Jacques .
SYSTEMATIC BIOLOGY, 2008, 57 (05) :758-771
[16]   RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models [J].
Stamatakis, Alexandros .
BIOINFORMATICS, 2006, 22 (21) :2688-2690
[17]   Rose: generating sequence families [J].
Stoye, J ;
Evers, D ;
Meyer, F .
BIOINFORMATICS, 1998, 14 (02) :157-163
[18]  
Swofford D., 1993, PAUP: Phylogenetic Analysis Using Parsimony
[19]  
*TREEBASE, DAT PHYL KNOWL