SpeciesRax: A Tool for Maximum Likelihood Species Tree Inference from Gene Family Trees under Duplication, Transfer, and Loss

被引:29
作者
Morel, Benoit [1 ,2 ]
Schade, Paul [2 ]
Lutteropp, Sarah [1 ]
Williams, Tom A. [3 ]
Szollosi, Gergely J. [4 ,5 ,6 ]
Stamatakis, Alexandros [1 ,2 ]
机构
[1] Heidelberg Inst Theoret Studies, Computat Mol Evolut Grp, Heidelberg, Germany
[2] Karlsruhe Inst Technol, Inst Theoret Informat, Karlsruhe, Germany
[3] Univ Bristol, Sch Biol Sci, Bristol, Avon, England
[4] ELTE MTA Lendulet Evolutionary, Budapest, Hungary
[5] Eotvos Lorand Univ, Dept Biol Phys, Budapest, Hungary
[6] Inst Evolut, Ctr Ecol Res, Budapest, Hungary
基金
欧洲研究理事会;
关键词
species tree inference; gene family tree; maximum likelihood; gene duplication; horizontal gene transfer; gene loss; MOLECULAR PHYLOGENY; EVOLUTION; GENOME; ORIGIN; LIFE; DIVERSIFICATION; RECONSTRUCTION; PERFORMANCE; CONFIDENCE; SEQUENCES;
D O I
10.1093/molbev/msab365
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modeling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated data sets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large data sets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising 188 species from 31,612 gene families in 1 h using 40 cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioCanda.
引用
收藏
页数:18
相关论文
共 88 条
[31]   Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic "supergroups" [J].
Hampl, Vladimir ;
Hug, Laura ;
Leigh, Jessica W. ;
Dacks, Joel B. ;
Lang, B. Franz ;
Simpson, Alastair G. B. ;
Roger, Andrew J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (10) :3859-3864
[32]   Phylogenomic Evidence for the Monophyly of Bryophytes and the Reductive Evolution of Stomata [J].
Harris, Brogan J. ;
Harrison, C. Jill ;
Hetherington, Alistair M. ;
Williams, Tom A. .
CURRENT BIOLOGY, 2020, 30 (11) :2001-+
[33]   Outgroup misplacement and phylogenetic inaccuracy under a molecular clock - A simulation study [J].
Holland, BR ;
Penny, D ;
Hendy, MD .
SYSTEMATIC BIOLOGY, 2003, 52 (02) :229-238
[34]   ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data [J].
Huerta-Cepas, Jaime ;
Serra, Francois ;
Bork, Peer .
MOLECULAR BIOLOGY AND EVOLUTION, 2016, 33 (06) :1635-1638
[35]   PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome [J].
Huerta-Cepas, Jaime ;
Capella-Gutierrez, Salvador ;
Pryszcz, Leszek P. ;
Marcet-Houben, Marina ;
Gabaldon, Toni .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D897-D902
[36]   Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data [J].
Hughes, Lily C. ;
Orti, Guillermo ;
Huang, Yu ;
Sun, Ying ;
Baldwin, Carole C. ;
Thompson, Andrew W. ;
Arcila, Dahiana ;
Betancur-R, Ricardo ;
Li, Chenhong ;
Becker, Leandro ;
Bellora, Nicolas ;
Zhao, Xiaomeng ;
Li, Xiaofeng ;
Wang, Min ;
Fang, Chao ;
Xie, Bing ;
Zhou, Zhuocheng ;
Huang, Hai ;
Chen, Songlin ;
Venkatesh, Byrappa ;
Shi, Qiong .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (24) :6249-6254
[37]   MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability [J].
Katoh, Kazutaka ;
Standley, Daron M. .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (04) :772-780
[38]   RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference [J].
Kozlov, Alexey M. ;
Darriba, Diego ;
Flouri, Tomas ;
Morel, Benoit ;
Stamatakis, Alexandros .
BIOINFORMATICS, 2019, 35 (21) :4453-4455
[39]   Inconsistency of phylogenetic estimates from concatenated data under coalescence [J].
Kubatko, Laura Salter ;
Degnan, James H. .
SYSTEMATIC BIOLOGY, 2007, 56 (01) :17-24
[40]  
Kumar P., 2018, ANIM BIOTECHNOL, V30, P219