RAxML and FastTree: Comparing Two Methods for Large-Scale Maximum Likelihood Phylogeny Estimation

被引:147
作者
Liu, Kevin [1 ]
Linder, C. Randal [2 ]
Warnow, Tandy [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Texas Austin, Sect Integrat Biol, Sch Biol Sci, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
ALGORITHM; TREES;
D O I
10.1371/journal.pone.0027731
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Statistical methods for phylogeny estimation, especially maximum likelihood (ML), offer high accuracy with excellent theoretical properties. However, RAxML, the current leading method for large-scale ML estimation, can require weeks or longer when used on datasets with thousands of molecular sequences. Faster methods for ML estimation, among them FastTree, have also been developed, but their relative performance to RAxML is not yet fully understood. In this study, we explore the performance with respect to ML score, running time, and topological accuracy, of FastTree and RAxML on thousands of alignments (based on both simulated and biological nucleotide datasets) with up to 27,634 sequences. We find that when RAxML and FastTree are constrained to the same running time, FastTree produces topologically much more accurate trees in almost all cases. We also find that when RAxML is allowed to run to completion, it provides an advantage over FastTree in terms of the ML score, but does not produce substantially more accurate tree topologies. Interestingly, the relative accuracy of trees computed using FastTree and RAxML depends in part on the accuracy of the sequence alignment and dataset size, so that FastTree can be more accurate than RAxML on large datasets with relatively inaccurate alignments. Finally, the running times of RAxML and FastTree are dramatically different, so that when run to completion, RAxML can take several orders of magnitude longer than FastTree to complete. Thus, our study shows that very large phylogenies can be estimated very quickly using FastTree, with little (and in some cases no) degradation in tree accuracy, as compared to RAxML.
引用
收藏
页数:11
相关论文
共 19 条
[1]  
[Anonymous], 2006, GENETIC ALGORITHM AP
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]  
Cannone J. J, 2002, BMC BIOINF, V3
[4]   A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood [J].
Guindon, S ;
Gascuel, O .
SYSTEMATIC BIOLOGY, 2003, 52 (05) :696-704
[5]   MRBAYES: Bayesian inference of phylogenetic trees [J].
Huelsenbeck, JP ;
Ronquist, F .
BIOINFORMATICS, 2001, 17 (08) :754-755
[6]   MAFFT version 5: improvement in accuracy of multiple sequence alignment [J].
Katoh, K ;
Kuma, K ;
Toh, H ;
Miyata, T .
NUCLEIC ACIDS RESEARCH, 2005, 33 (02) :511-518
[7]   Recent developments in the MAFFT multiple sequence alignment program [J].
Katoh, Kazutaka ;
Toh, Hiroyuki .
BRIEFINGS IN BIOINFORMATICS, 2008, 9 (04) :286-298
[8]   PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences [J].
Katoh, Kazutaka ;
Toh, Hiroyuki .
BIOINFORMATICS, 2007, 23 (03) :372-374
[9]  
Litzkow M. J., 1987, Proceedings of the Summer 1987 USENIX Conference, P381
[10]  
LIU K, SYST BIOL