Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes

被引:376
作者
Kelleher, Jerome [1 ]
Etheridge, Alison M. [2 ]
McVean, Gilean [1 ,2 ,3 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England
[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[3] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Oxford, England
基金
英国惠康基金; 英国工程与自然科学研究理事会;
关键词
GENOME-WIDE ASSOCIATION; POPULATION GENETIC DATA; NEUTRAL ALLELE MODEL; GENERAL COALESCENT; RECOMBINATION; PROGRAM; HISTORY; SIMCOAL; ALGORITHMS; DIVERSITY;
D O I
10.1371/journal.pcbi.1004842
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.
引用
收藏
页数:22
相关论文
共 122 条
[1]   Fast model-based estimation of ancestry in unrelated individuals [J].
Alexander, David H. ;
Novembre, John ;
Lange, Kenneth .
GENOME RESEARCH, 2009, 19 (09) :1655-1664
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]   Serial SimCoal: A population genetics model for data from multiple populations and points in time [J].
Anderson, CNK ;
Ramakrishnan, U ;
Chan, YL ;
Hadly, EA .
BIOINFORMATICS, 2005, 21 (08) :1733-1734
[4]  
[Anonymous], 1997, HIER DAT FORM VERS 5
[5]   Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography [J].
Arenas, Miguel ;
Posada, David .
BMC BIOINFORMATICS, 2007, 8 (1)
[6]   Simulation of Molecular Data under Diverse Evolutionary Scenarios [J].
Arenas, Miguel .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (05)
[7]   Coalescent Simulation of Intracodon Recombination [J].
Arenas, Miguel ;
Posada, David .
GENETICS, 2010, 184 (02) :429-U169
[8]   Inference in two dimensions: Allele frequencies versus lengths of shared sequence blocks [J].
Barton, N. H. ;
Etheridge, A. M. ;
Kelleher, J. ;
Veber, A. .
THEORETICAL POPULATION BIOLOGY, 2013, 87 :105-119
[9]   Modelling evolution in a spatial continuum [J].
Barton, N. H. ;
Etheridge, A. M. ;
Veber, A. .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2013,
[10]   A new model for evolution in a spatial continuum [J].
Barton, N. H. ;
Etheridge, A. M. ;
Veber, A. .
ELECTRONIC JOURNAL OF PROBABILITY, 2010, 15 :162-216