nQMaker: Estimating Time Nonreversible Amino Acid Substitution Models

被引:16
作者
Cuong Cao Dang [1 ]
Bui Quang Minh [2 ]
McShea, Hanon [3 ]
Masel, Joanna [4 ]
James, Jennifer Eleanor [5 ]
Le Sy Vinh [1 ]
Lanfear, Robert [6 ]
机构
[1] Vietnam Natl Univ, Univ Engn & Technol, Fac Informat Technol, 144 Xuan Thuy, Hanoi 10000, Vietnam
[2] Australian Natl Univ, Sch Comp, Computat Phylogen Lab, Canberra, ACT 2601, Australia
[3] Stanford Univ, Sch Earth Energy & Environm Sci, Dept Earth Syst Sci, Palo Alto, CA 94305 USA
[4] Univ Arizona, Dept Ecol & Evolutionary Biol, Tucson, AZ 85721 USA
[5] Uppsala Univ, Evolutionary Biol Ctr, Dept Ecol & Genet, Plant Ecol & Evolut, SE-75236 Uppsala, Sweden
[6] Australian Natl Univ, Res Sch Biol, Dept Ecol & Evolut, Canberra, ACT 2601, Australia
基金
澳大利亚研究理事会;
关键词
amino acid sequence analyses; amino acid substitution models; maximum likelihood model estimation; nonreversible models; phylogenetic inference; reversible models; MAXIMUM-LIKELIHOOD-ESTIMATION; PHYLOGENETIC TREES; PROTEIN EVOLUTION; DNA-SEQUENCES; PHYLOGENOMICS; MATRICES; LIFE;
D O I
10.1093/sysbio/syac007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All commonly used amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups. In this article, we introduce a maximum likelihood approach nQMaker, an extension of the recently published QMaker method, that allows the estimation of time nonreversible amino acid substitution models and rooted phylogenetic trees from a set of protein sequence alignments. We show that the nonreversible models estimated with nQMaker are a much better fit to empirical alignments than pre-existing reversible models, across a wide range of data sets including mammals, birds, plants, fungi, and other taxa, and that the improvements in model fit scale with the size of the data set. Notably, for the recently published plant and bird trees, these nonreversible models correctly recovered the commonly estimated root placements with very high-statistical support without the need to use an outgroup. We provide nQMaker as an easy-to-use feature in the IQ-TREE software (http://www.iqtree.org), allowing users to estimate nonreversible models and rooted phylogenies from their own protein data sets.
引用
收藏
页码:1110 / 1123
页数:14
相关论文
共 46 条
[1]   Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent [J].
Allman, Elizabeth S. ;
Degnan, James H. ;
Rhodes, John A. .
JOURNAL OF MATHEMATICAL BIOLOGY, 2011, 62 (06) :833-862
[2]   Root Digger: a root placement program for phylogenetic trees [J].
Bettisworth, Ben ;
Stamatakis, Alexandros .
BMC BIOINFORMATICS, 2021, 22 (01)
[3]   Genome-scale coestimation of species and gene trees [J].
Boussau, Bastien ;
Szoellosi, Gergely J. ;
Duret, Laurent ;
Gouy, Manolo ;
Tannier, Eric ;
Daubin, Vincent .
GENOME RESEARCH, 2013, 23 (02) :323-330
[4]   Human contamination in bacterial genomes has created thousands of spurious proteins [J].
Breitwieser, Florian P. ;
Pertea, Mihaela ;
Zimin, Aleksey V. ;
Salzberg, Steven L. .
GENOME RESEARCH, 2019, 29 (06) :954-960
[5]   QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution [J].
Bui Quang Minh ;
Cuong Cao Dang ;
Le Sy Vinh ;
Lanfear, Robert .
SYSTEMATIC BIOLOGY, 2021, 70 (05) :1046-1060
[6]   FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets [J].
Cuong Cao Dang ;
Vinh Sy Le ;
Gascuel, Olivier ;
Hazes, Bart ;
Quang Si Le .
BMC BIOINFORMATICS, 2014, 15 :341
[7]   UFBoot2: Improving the Ultrafast Bootstrap Approximation [J].
Diep Thi Hoang ;
Chernomor, Olga ;
von Haeseler, Arndt ;
Minh, Bui Quang ;
Le Sy Vinh .
MOLECULAR BIOLOGY AND EVOLUTION, 2018, 35 (02) :518-522
[8]   Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference [J].
Duchene, David A. ;
Tong, K. Jun ;
Foster, Charles S. P. ;
Duchene, Sebastian ;
Lanfear, Robert ;
Ho, Simon Y. W. .
MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (04) :1202-1210
[9]   The Pfam protein families database in 2019 [J].
El-Gebali, Sara ;
Mistry, Jaina ;
Bateman, Alex ;
Eddy, Sean R. ;
Luciani, Aurelien ;
Potter, Simon C. ;
Qureshi, Matloob ;
Richardson, Lorna J. ;
Salazar, Gustavo A. ;
Smart, Alfredo ;
Sonnhammer, Erik L. L. ;
Hirsh, Layla ;
Paladin, Lisanna ;
Piovesan, Damiano ;
Tosatto, Silvio C. E. ;
Finn, Robert D. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D427-D432
[10]   ESTIMATING PHYLOGENETIC TREES FROM DISTANCE MATRICES [J].
FARRIS, JS .
AMERICAN NATURALIST, 1972, 106 (951) :645-&