Increasing the efficiency of searches for the maximum likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences

被引:53
作者
Morrison, David A. [1 ,2 ]
机构
[1] Natl Vet Inst, Dept Parasitol SWEPAR, S-75189 Uppsala, Sweden
[2] Swedish Univ Agr Sci, S-75189 Uppsala, Sweden
关键词
large data sets; maximum likelihood; phylogeny; search strategies; tree islands;
D O I
10.1080/10635150701779808
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Even when the maximum likelihood (NIL) tree is a better estimate of the true phylogenetic tree than those produced by other methods, the result of a poor NIL search may be no better than that of a more thorough search under some faster criterion. The ability to find the globally optimal NIL tree is therefore important. Here, I compare a range of heuristic search strategies (and their associated computer programs) in terms of their success at locating the NIL tree for 20 empirical data sets with 14 to 158 sequences and 411 to 120,762 aligned nucleotides. Three distinct topics are discussed: the success of the search strategies in relation to certain features of the data, the generation of starting trees for the search, and the exploration of multiple islands of trees. As a starting tree, there was little difference among the neighbor-joining tree based on absolute differences (including the BioNJ tree), the stepwise-addition parsimony tree (with or without nearest-neighbor-interchange (NNI) branch swapping), and the stepwise-addition NIL tree. The latter produced the best NIL score on average but was orders of magnitude slower than the alternatives. The BioNJ tree was second best on average. As search strategies, star decomposition and quartet puzzling were the slowest and produced the worst NIL scores. The DPRml, IQPNNI, MultiPhyl, PhyML, PhyNav, and TreeFinder programs with default options produced qualitatively similar results, each locating a single tree that tended to be in an NNI suboptimum (rather than the global optimum) when the data set had low phylogenetic information. For such data sets, there were multiple tree islands with very similar NIL scores. The likelihood surface only became relatively simple for data sets that contained approximately 500 aligned nucleotides for 50 sequences and 3,000 nucleotides for 100 sequences. The RAxML and GARLI programs allowed multiple islands to be explored easily, but both programs also tended to find NNI suboptima. A newly developed version of the likelihood ratchet using PAUP* successfully found the peaks of multiple islands, but its speed needs to be improved.
引用
收藏
页码:988 / 1010
页数:23
相关论文
共 96 条
[1]  
Allen B. L., 2001, ANN COMB, V5, P1
[2]  
[Anonymous], 2006, GARLI GENETIC ALGORI
[3]  
[Anonymous], 1989, Cladistics, DOI DOI 10.1111/J.1096-0031.1989.TB00562.X
[4]   Molecular phylogenetic analysis of the dragonfly genera Libellula, Ladona, and Plathemis (Odonata: Libellulidae) based on mitochondrial cytochrome oxidase I and 16S rRNA sequence data [J].
Artiss, T ;
Schultz, TR ;
Polhemus, DA ;
Simon, C .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2001, 18 (03) :348-361
[5]   Computational grand challenges in assembling the tree of life: Problems and solutions [J].
Bader, David A. ;
Roshan, Usman ;
Stamatakis, Alexandros .
ADVANCES IN COMPUTERS , VOL 68: COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 68 :127-176
[6]   Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference [J].
Brauer, MJ ;
Holder, MT ;
Dries, LA ;
Zwickl, DJ ;
Lewis, PO ;
Hillis, DM .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (10) :1717-1726
[7]  
Bryant D, 2005, MATHEMATICS OF EVOLUTION AND PHYLOGENY, P33
[8]  
Charleston M A, 1995, J Comput Biol, V2, P439, DOI 10.1089/cmb.1995.2.439
[9]  
Clement M., 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469), P335, DOI 10.1109/HPDC.1999.805315
[10]  
Davis Jerrold I., 2005, P119