Efficient ancestry and mutation simulation with msprime 1.0

被引:163
作者
Baumdicker, Franz [1 ]
Bisschop, Gertjan [2 ]
Goldstein, Daniel [3 ,25 ]
Gower, Graham [4 ]
Ragsdale, Aaron P. [5 ]
Tsambos, Georgia
Zhu, Sha [6 ,7 ]
Eldon, Bjarki [8 ]
Ellerman, E. Castedo [9 ]
Galloway, Jared G. [10 ,11 ]
Gladstein, Ariella L. [12 ,13 ]
Gorjanc, Gregor [14 ,15 ]
Guo, Bing [16 ]
Jeffery, Ben [6 ,7 ]
Kretzschumar, Warren W. [17 ]
Lohse, Konrad [2 ]
Matschiner, Michael [18 ]
Nelson, Dominic [19 ]
Pope, Nathaniel S. [20 ]
Quinto-Cortes, Consuelo D. [21 ]
Rodrigues, Murillo F. [10 ]
Saunack, Kumar [22 ]
Sellinger, Thibaut [23 ]
Thornton, Kevin [24 ]
van Kemenade, Hugo
Wohns, Anthony W. [6 ,7 ,25 ]
Wong, Yan [6 ,7 ]
Gravel, Simon [19 ]
Kern, Andrew D. [10 ]
Koskela, Jere [26 ]
Ralph, Peter L. [10 ,27 ]
Kelleher, Jerome [6 ,7 ]
机构
[1] Univ Tubingen, Cluster Excellence Controlling Microbes Fight Inf, D-72076 Tubingen, Germany
[2] Univ Edinburgh, Inst Evolutionary Biol, Edinburgh EH9 3FL, Midlothian, Scotland
[3] Northeastern Univ, Khoury Coll Comp Sci, Boston, MA 02115 USA
[4] Univ Copenhagen, Lundbeck GeoGenet Ctr, Globe Inst, DK-1350 Copenhagen K, Denmark
[5] Univ Wisconsin, Dept Integrat Biol, Madison, WI 53706 USA
[6] Univ Melbourne, Sch Math & Stat, Melbourne Integrat Genom, Parkville, Vic 3010, Australia
[7] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Big Data Inst, Oxford OX3 7LF, England
[8] Museum Nat Kunde, Leibniz Inst Evolut & Biodivers Sci, D-10115 Berlin, Germany
[9] Fresh Pond Res Inst, Cambridge, MA 02140 USA
[10] Univ Oregon, Inst Ecol & Evolut, Dept Biol, Eugene, OR 97403 USA
[11] Fred Hutchinson Canc Res Ctr, Computat Biol Program, Seattle, WA 98102 USA
[12] Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
[13] Embark Vet Inc, Boston, MA 02111 USA
[14] Univ Edinburgh, Roslin Inst, Edinburgh EH25 9RG, Midlothian, Scotland
[15] Univ Edinburgh, Royal Dick Sch Vet Studies, Edinburgh EH25 9RG, Midlothian, Scotland
[16] Univ Maryland, Inst Genome Sci, Sch Med, Baltimore, MD 21201 USA
[17] Karolinska Inst, Ctr Hematol & Regenerat Med, S-14183 Huddinge, Sweden
[18] Univ Oslo, Nat Hist Museum, N-0318 Oslo, Norway
[19] McGill Univ, Dept Human Genet, Montreal, PQ H3A 0C7, Canada
[20] Penn State Univ, Dept Entomol, State Coll, PA 16802 USA
[21] CINVESTAV, Natl Lab Genom Biodivers LANGEBIO, Unit Adv Genom, Irapuato, Mexico
[22] Indian Inst Technol, Mumbai 400076, Maharashtra, India
[23] Tech Univ Munich, Dept Life Sci Syst, Professorship Populat Genet, D-85354 Freising Weihenstephan, Germany
[24] Univ Calif Irvine, Dept Ecol & Evolutionary Biol, Irvine, CA 92697 USA
[25] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[26] Univ Warwick, Dept Stat, Coventry CV4 7AL, W Midlands, England
[27] Univ Oregon, Dept Math, Eugene, OR 97403 USA
基金
欧洲研究理事会; 加拿大健康研究院; 英国生物技术与生命科学研究理事会; 美国国家卫生研究院; 英国工程与自然科学研究理事会;
关键词
simulation; coalescent; mutations; Ancestral Recombination Graphs; SITE-FREQUENCY-SPECTRUM; SKEWED OFFSPRING DISTRIBUTIONS; POPULATION GENETIC SIMULATION; NEUTRAL ALLELE MODEL; COALESCENT PROCESSES; RECOMBINATION RATES; DEMOGRAPHIC HISTORY; BAYESIAN-INFERENCE; GENEALOGICAL PROPERTIES; DELETERIOUS MUTATIONS;
D O I
10.1093/genetics/iyab229
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Since its introduction in 2016, the msprime simulator has grown in popularity and is now one of the most commonly used tools in population genetics. This article marks the 1.0 release of msprime and summarizes the many features it has accumulated through an open source community development model. Despite its generality, msprime's performance is excellent-in many cases orders of magnitude faster and more memory efficient than more specialized methods. Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
引用
收藏
页数:19
相关论文
共 200 条
[1]   A community-maintained standard library of population genetic models [J].
Adrion, Jeffrey R. ;
Cole, Christopher B. ;
Dukler, Noah ;
Galloway, Jared G. ;
Gladstein, Ariella L. ;
Gower, Graham ;
Kyriazis, Christopher C. ;
Ragsdale, Aaron P. ;
Tsambos, Georgia ;
Baumdicker, Franz ;
Carlson, Jedidiah ;
Cartwright, Reed A. ;
Durvasula, Arun ;
Gronau, Ilan ;
Kim, Bernard Y. ;
McKenzie, Patrick ;
Messer, Philipp W. ;
Noskova, Ekaterina ;
Ortega-Del Vecchyo, Diego ;
Racimo, Fernando ;
Struck, Travis J. ;
Gravel, Simon ;
Gutenkunst, Ryan N. ;
Lohmueller, Kirk E. ;
Ralph, Peter L. ;
Schrider, Daniel R. ;
Siepel, Adam ;
Kelleher, Jerome ;
Kern, Andrew D. .
ELIFE, 2020, 9 :1-39
[2]   Predicting the Landscape of Recombination Using Deep Learning [J].
Adrion, Jeffrey R. ;
Galloway, Jared G. ;
Kern, Andrew D. .
MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (06) :1790-1808
[3]  
[Anonymous], 2004, Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory, DOI DOI 10.1080/10635150500354860
[4]  
[Anonymous], 1978, ATLAS PROTEIN SEQ ST
[5]  
[Anonymous], 2008, Coalescent Theory: An Introduction
[6]   Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography [J].
Arenas, Miguel ;
Posada, David .
BMC BIOINFORMATICS, 2007, 8 (1)
[7]   Simulation of Molecular Data under Diverse Evolutionary Scenarios [J].
Arenas, Miguel .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (05)
[8]   Mitochondrial cytochrome b DNA variation in the high-fecundity Atlantic cod:: Trans-atlantic clines and shallow gene genealogy [J].
Arnason, E .
GENETICS, 2004, 166 (04) :1871-1885
[9]   A NEW MODEL FOR EXTINCTION AND RECOLONIZATION IN TWO DIMENSIONS: QUANTIFYING PHYLOGEOGRAPHY [J].
Barton, Nicholas H. ;
Kelleher, Jerome ;
Etheridge, Alison M. .
EVOLUTION, 2010, 64 (09) :2701-2715
[10]   The infinitely many genes model with horizontal gene transfer [J].
Baumdicker, Franz ;
Pfaffelhuber, Peter .
ELECTRONIC JOURNAL OF PROBABILITY, 2014, 19