Fast and Consistent Estimation of Species Trees Using Supermatrix Rooted Triples

被引:57
作者
DeGiorgio, Michael [1 ]
Degnan, James H. [2 ]
机构
[1] Univ Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[2] Univ Canterbury, Dept Math & Stat, Christchurch 1, New Zealand
关键词
phylogenetics; phylogenomics; anomaly zone; anomalous gene tree; statistical consistency; lineage sorting; MAXIMUM-LIKELIHOOD; GENE TREES; PHYLOGENETIC INFERENCE; BAYESIAN-ESTIMATION; DATA SETS; COMPLEXITY; EVOLUTION; CONSENSUS; DISTRIBUTIONS; INCONGRUENCE;
D O I
10.1093/molbev/msp250
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Concatenated sequence alignments are often used to infer species-level relationships. Previous studies have shown that analysis of concatenated data using maximum likelihood (ML) can produce misleading results when loci have differing gene tree topologies due to incomplete lineage sorting. Here, we develop a polynomial time method that utilizes the modified mincut supertree algorithm to construct an estimated species tree from inferred rooted triples of concatenated alignments. We term this method SuperMatrix Rooted Triple (SMRT) and use the notation SMRT-ML when rooted triples are inferred by ML. We use simulations to investigate the performance of SMRT-ML under Jukes-Cantor and general time-reversible substitution models for four- and five-taxon species trees and also apply the method to an empirical data set of yeast genes. We find that SMRT-ML converges to the correct species tree in many cases in which ML on the full concatenated data set fails to do so. SMRT-ML can be conservative in that its output tree is often partially unresolved for problematic clades. We show analytically that when the species tree is clocklike and mutations occur under the Cavender-Farris-Neyman substitution model, as the number of genes increases, SMRT-ML is increasingly likely to infer the correct species tree even when the most likely gene tree does not match the species tree. SMRT-ML is therefore a computationally efficient and statistically consistent estimator of the species tree when gene trees are distributed according to the multispecies coalescent model.
引用
收藏
页码:552 / 569
页数:18
相关论文
共 64 条
[31]   Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous [J].
Kolaczkowski, B ;
Thornton, JW .
NATURE, 2004, 431 (7011) :980-984
[32]   Inconsistency of phylogenetic estimates from concatenated data under coalescence [J].
Kubatko, Laura Salter ;
Degnan, James H. .
SYSTEMATIC BIOLOGY, 2007, 56 (01) :17-24
[33]   BEST: Bayesian estimation of species trees under the coalescent model [J].
Liu, Liang .
BIOINFORMATICS, 2008, 24 (21) :2542-2543
[34]   Species trees from gene trees: Reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions [J].
Liu, Liang ;
Pearl, Dennis K. .
SYSTEMATIC BIOLOGY, 2007, 56 (03) :504-514
[35]   Phylogenetic Analysis in the Anomaly Zone [J].
Liu, Liang ;
Edwards, Scott V. .
SYSTEMATIC BIOLOGY, 2009, 58 (04) :452-460
[36]  
Maddison WP, 1997, SYST BIOL, V46, P523, DOI 10.1093/sysbio/46.3.523
[37]   Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: A model [J].
Meng, Chen ;
Kubatko, Laura Salter .
THEORETICAL POPULATION BIOLOGY, 2009, 75 (01) :35-45
[38]   Phylogenetic MCMC algorithms are misleading on mixtures of trees [J].
Mossel, E ;
Vigoda, E .
SCIENCE, 2005, 309 (5744) :2207-2209
[39]  
Neyman Jerzy, 1971, Statistical decision theory and related topics, P1
[40]  
Page RDM, 2002, LECT NOTES COMPUT SC, V2452, P537