MAST: Phylogenetic Inference with Mixtures Across Sites and Trees

被引:4
作者
Wong, Thomas K. F. [1 ]
Cherryh, Caitlin [2 ]
Rodrigo, Allen G. [3 ]
Hahn, Matthew W. [4 ,5 ]
Minh, Bui Quang [1 ]
Lanfear, Robert [2 ]
机构
[1] Australian Natl Univ, Sch Comp, Canberra, ACT 2601, Australia
[2] Australian Natl Univ, Res Sch Biol, Canberra, ACT 2601, Australia
[3] Univ Auckland, Sch Biol Sci, Auckland 1142, New Zealand
[4] Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
[5] Indiana Univ, Dept Comp Sci, Bloomington, IN 47405 USA
基金
美国国家科学基金会; 澳大利亚研究理事会;
关键词
Incomplete lineage sorting; introgression; mixture model; multitree model; phylogenetics; MAXIMUM-LIKELIHOOD; GENE TREES; BAYESIAN-INFERENCE; DNA-SEQUENCES; SPECIES TREES; MODEL; IDENTIFIABILITY; ACCURATE; TIME; RECONSTRUCTION;
D O I
10.1093/sysbio/syae008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.
引用
收藏
页码:375 / 391
页数:17
相关论文
共 56 条
  • [1] When Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models?
    Allman, Elizabeth S.
    Rhodes, John A.
    Sullivant, Seth
    [J]. SYSTEMATIC BIOLOGY, 2012, 61 (06) : 1049 - 1059
  • [2] Identifiability of Two-Tree Mixtures for Group-Based Models
    Allman, Elizabeth S.
    Petrovic, Sonja
    Rhodes, John A.
    Sullivant, Seth
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (03) : 710 - 722
  • [3] BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis
    Bouckaert, Remco
    Vaughan, Timothy G.
    Barido-Sottani, Joelle
    Duchene, Sebastian
    Fourment, Mathieu
    Gavryushkina, Alexandra
    Heled, Joseph
    Jones, Graham
    Kuehnert, Denise
    De Maio, Nicola
    Matschiner, Michael
    Mendes, Fabio K.
    Mueller, Nicola F.
    Ogilvie, Huw A.
    du Plessis, Louis
    Popinga, Alex
    Rambaut, Andrew
    Rasmussen, David
    Siveroni, Igor
    Suchard, Marc A.
    Wu, Chieh-Hsi
    Xie, Dong
    Zhang, Chi
    Stadler, Tanja
    Drummond, Alexei J.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (04)
  • [4] Boussau B, 2009, EVOL BIOINFORM, V5, P67
  • [5] Bryant D., 2020, PHYLOGENETICS GENOMI, p3.4:1
  • [6] Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis
    Bryant, David
    Bouckaert, Remco
    Felsenstein, Joseph
    Rosenberg, Noah A.
    RoyChoudhury, Arindam
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (08) : 1917 - 1932
  • [7] Burnham K.P., 2002, Model Selection and Multimodel Inference, P488, DOI 10.1007/b97636
  • [8] Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites
    Chifman, Julia
    Kubatko, Laura
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2015, 374 : 35 - 47
  • [9] GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments
    Crotty, Stephen M.
    Bui Quang Minh
    Bean, Nigel G.
    Holland, Barbara R.
    Tuke, Jonathan
    Jermiin, Lars S.
    von Haeseler, Arndt
    [J]. SYSTEMATIC BIOLOGY, 2020, 69 (02) : 249 - 264
  • [10] Discordance of species trees with their most likely gene trees
    Degnan, James H.
    Rosenberg, Noah A.
    [J]. PLOS GENETICS, 2006, 2 (05): : 762 - 768