Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes

被引:352
作者
Kelleher, Jerome [1 ]
Etheridge, Alison M. [2 ]
McVean, Gilean [1 ,2 ,3 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England
[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[3] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Oxford, England
基金
英国工程与自然科学研究理事会; 英国惠康基金;
关键词
GENOME-WIDE ASSOCIATION; POPULATION GENETIC DATA; NEUTRAL ALLELE MODEL; GENERAL COALESCENT; RECOMBINATION; PROGRAM; HISTORY; SIMCOAL; ALGORITHMS; DIVERSITY;
D O I
10.1371/journal.pcbi.1004842
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.
引用
收藏
页数:22
相关论文
共 122 条
  • [1] Fast model-based estimation of ancestry in unrelated individuals
    Alexander, David H.
    Novembre, John
    Lange, Kenneth
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1655 - 1664
  • [2] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [3] Serial SimCoal: A population genetics model for data from multiple populations and points in time
    Anderson, CNK
    Ramakrishnan, U
    Chan, YL
    Hadly, EA
    [J]. BIOINFORMATICS, 2005, 21 (08) : 1733 - 1734
  • [4] [Anonymous], 1997, HIER DAT FORM VERS 5
  • [5] Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography
    Arenas, Miguel
    Posada, David
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [6] Simulation of Molecular Data under Diverse Evolutionary Scenarios
    Arenas, Miguel
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (05)
  • [7] Coalescent Simulation of Intracodon Recombination
    Arenas, Miguel
    Posada, David
    [J]. GENETICS, 2010, 184 (02) : 429 - U169
  • [8] Inference in two dimensions: Allele frequencies versus lengths of shared sequence blocks
    Barton, N. H.
    Etheridge, A. M.
    Kelleher, J.
    Veber, A.
    [J]. THEORETICAL POPULATION BIOLOGY, 2013, 87 : 105 - 119
  • [9] Modelling evolution in a spatial continuum
    Barton, N. H.
    Etheridge, A. M.
    Veber, A.
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2013,
  • [10] A new model for evolution in a spatial continuum
    Barton, N. H.
    Etheridge, A. M.
    Veber, A.
    [J]. ELECTRONIC JOURNAL OF PROBABILITY, 2010, 15 : 162 - 216