MAST: Phylogenetic Inference with Mixtures Across Sites and Trees

被引:4
作者
Wong, Thomas K. F. [1 ]
Cherryh, Caitlin [2 ]
Rodrigo, Allen G. [3 ]
Hahn, Matthew W. [4 ,5 ]
Minh, Bui Quang [1 ]
Lanfear, Robert [2 ]
机构
[1] Australian Natl Univ, Sch Comp, Canberra, ACT 2601, Australia
[2] Australian Natl Univ, Res Sch Biol, Canberra, ACT 2601, Australia
[3] Univ Auckland, Sch Biol Sci, Auckland 1142, New Zealand
[4] Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
[5] Indiana Univ, Dept Comp Sci, Bloomington, IN 47405 USA
基金
美国国家科学基金会; 澳大利亚研究理事会;
关键词
Incomplete lineage sorting; introgression; mixture model; multitree model; phylogenetics; MAXIMUM-LIKELIHOOD; GENE TREES; BAYESIAN-INFERENCE; DNA-SEQUENCES; SPECIES TREES; MODEL; IDENTIFIABILITY; ACCURATE; TIME; RECONSTRUCTION;
D O I
10.1093/sysbio/syae008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.
引用
收藏
页码:375 / 391
页数:17
相关论文
共 56 条
  • [31] Inferring whole-genome histories in large population datasets
    Kelleher, Jerome
    Wong, Yan
    Wohns, Anthony W.
    Fadil, Chaimaa
    Albers, Patrick K.
    McVean, Gil
    [J]. NATURE GENETICS, 2019, 51 (09) : 1330 - +
  • [32] Inconsistency of phylogenetic estimates from concatenated data under coalescence
    Kubatko, Laura Salter
    Degnan, James H.
    [J]. SYSTEMATIC BIOLOGY, 2007, 56 (01) : 17 - 24
  • [33] A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process
    Lartillot, N
    Philippe, H
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (06) : 1095 - 1109
  • [34] Modeling Protein Evolution with Several Amino Acid Replacement Matrices Depending on Site Rates
    Le, Si Quang
    Cuong Cao Dang
    Gascuel, Olivier
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (10) : 2921 - 2936
  • [35] FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program
    Lefort, Vincent
    Desper, Richard
    Gascuel, Olivier
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2015, 32 (10) : 2798 - 2800
  • [36] A maximum pseudo-likelihood approach for estimating species trees under the coalescent model
    Liu, Liang
    Yu, Lili
    Edwards, Scott V.
    [J]. BMC EVOLUTIONARY BIOLOGY, 2010, 10
  • [37] AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era
    Ly-Trong, Nhan
    Naser-Khdour, Suha
    Lanfear, Robert
    Minh, Bui Quang
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2022, 39 (05)
  • [38] Gene trees in species trees
    Maddison, WP
    [J]. SYSTEMATIC BIOLOGY, 1997, 46 (03) : 523 - 536
  • [39] Why Concatenation Fails Near the Anomaly Zone
    Mendes, Fabio K.
    Hahn, Matthew W.
    [J]. SYSTEMATIC BIOLOGY, 2018, 67 (01) : 158 - 169
  • [40] Gene Tree Discordance Causes Apparent Substitution Rate Variation
    Mendes, Fabio K.
    Hahn, Matthew W.
    [J]. SYSTEMATIC BIOLOGY, 2016, 65 (04) : 711 - 721