AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era

被引:35
作者
Ly-Trong, Nhan [1 ]
Naser-Khdour, Suha [2 ]
Lanfear, Robert [2 ]
Minh, Bui Quang [1 ]
机构
[1] Australian Natl Univ, Coll Engn & Comp Sci, Sch Comp, Canberra, ACT 2600, Australia
[2] Australian Natl Univ, Coll Sci, Res Sch Biol, Ecol & Evolut, Canberra, ACT 2600, Australia
基金
澳大利亚研究理事会;
关键词
sequence simulation; phylogenetics; molecular evolution; MAXIMUM-LIKELIHOOD; STATISTICAL TESTS; DELETIONS; EVOLUTION; MODELS; INSERTIONS; SITES; RATES;
D O I
10.1093/molbev/msac092
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Sequence simulators play an important role in phylogenetics. Simulated data has many applications, such as evaluating the performance of different methods, hypothesis testing with parametric bootstraps, and, more recently, generating data for training machine-learning applications. Many sequence simulation programmes exist, but the most feature-rich programmes tend to be rather slow, and the fastest programmes tend to be feature-poor. Here, we introduce AliSim, a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 h and 1.3 GB RAM to simulate alignments with one million sequences or sites, whereas popular software Seq-Gen, Dawg, and INDELible require 2-5 h and 50-500 GB of RAM. We provide AliSim as an extension of the IQ-TREE software version 2.2, freely available at www.iqtree.org, and a comprehensive user tutorial at http://www. iqtree.org/doc/AliSim.
引用
收藏
页数:9
相关论文
共 39 条
  • [21] ON THE GENERALIZED BIRTH-AND-DEATH PROCESS
    KENDALL, DG
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1948, 19 (01): : 1 - 15
  • [22] KUHNER MK, 1994, MOL BIOL EVOL, V11, P459
  • [23] Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks
    Leuchtenberger, Alina F.
    Crotty, Stephen M.
    Drucks, Tamara
    Schmidt, Heiko A.
    Burgstaller-Muehlbacher, Sebastian
    von Haeseler, Arndt
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (12) : 3632 - 3641
  • [24] Deep Neighbor Information Learning From Evolution Trees for Phylogenetic Likelihood Estimates
    Ling, Cheng
    Cheng, Wenhao
    Zhang, Haoyu
    Zhu, Hanhao
    Zhang, Hua
    [J]. IEEE ACCESS, 2020, 8 : 220692 - 220702
  • [25] IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era
    Minh, Bui Quang
    Schmidt, Heiko A.
    Chernomor, Olga
    Schrempf, Dominik
    Woodhams, Michael D.
    von Haeseler, Arndt
    Lanfear, Robert
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (05) : 1530 - 1534
  • [26] Naser-Khdour S., 2021, BIORXIV
  • [27] IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies
    Lam-Tung Nguyen
    Schmidt, Heiko A.
    von Haeseler, Arndt
    Bui Quang Minh
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2015, 32 (01) : 268 - 274
  • [28] Rambaut A, 1997, COMPUT APPL BIOSCI, V13, P235
  • [29] COMPARISON OF PHYLOGENETIC TREES
    ROBINSON, DF
    FOULDS, LR
    [J]. MATHEMATICAL BIOSCIENCES, 1981, 53 (1-2) : 131 - 147
  • [30] SCHONIGER M, 1995, COMPUT APPL BIOSCI, V11, P111