Asymptotic Distributions of Coalescence Times and Ancestral Lineage Numbers for Populations with Temporally Varying Size

被引:20
作者
Chen, Hua [1 ]
Chen, Kun [2 ]
机构
[1] Fudan Univ, Sch Life Sci, Minist Educ, Key Lab Contemporary Anthropol, Shanghai 200433, Peoples R China
[2] Harvard Univ, Sch Med, Dana Farber Canc Inst, Boston, MA 02115 USA
关键词
ALLELE-FREQUENCY-SPECTRUM; SAMPLING THEORY; GENETIC DRIFT; OF-DESCENT; INFERENCE; MODEL;
D O I
10.1534/genetics.113.151522
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by T-m the mth coalescent time, when m + 1 lineages coalesce into m lineages, and A(n)(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, A(n)(t), and the coalescence times, T-m, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t -> 0, the number of coalesced lineages n - A(n)(t) follows a Poisson distribution, and as m -> n, n(n - 1)T-m/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference.
引用
收藏
页码:721 / +
页数:23
相关论文
共 47 条
  • [1] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [2] Estimation of the number of individuals founding colonized populations
    Anderson, Eric C.
    Slatkin, Montgomery
    [J]. EVOLUTION, 2007, 61 (04) : 972 - 983
  • [3] [Anonymous], 2001, Handbook of Statistical Genomics
  • [4] [Anonymous], 2012, PROBABILITY MEASURE
  • [5] [Anonymous], 1990, OXF SURV EVOL BIOL
  • [6] [Anonymous], 2005, Gene genealogies, variation and evolution
  • [7] The joint allele-frequency spectrum in closely related species
    Chen, Hua
    Green, Richard E.
    Paeaebo, Svante
    Sladkin, Montgomery
    [J]. GENETICS, 2007, 177 (01) : 387 - 398
  • [8] Intercoalescence Time Distribution of Incomplete Gene Genealogies in Temporally Varying Populations, and Applications in Population Genetic Inference
    Chen, Hua
    [J]. ANNALS OF HUMAN GENETICS, 2013, 77 : 158 - 173
  • [9] The joint allele frequency spectrum of multiple populations: A coalescent theory approach
    Chen, Hua
    [J]. THEORETICAL POPULATION BIOLOGY, 2012, 81 (02) : 179 - 195
  • [10] Deep resequencing reveals excess rare recent variants consistent with explosive population growth
    Coventry, Alex
    Bull-Otterson, Lara M.
    Liu, Xiaoming
    Clark, Andrew G.
    Maxwell, Taylor J.
    Crosby, Jacy
    Hixson, James E.
    Rea, Thomas J.
    Muzny, Donna M.
    Lewis, Lora R.
    Wheeler, David A.
    Sabo, Aniko
    Lusk, Christine
    Weiss, Kenneth G.
    Akbar, Humeira
    Cree, Andrew
    Hawes, Alicia C.
    Newsham, Irene
    Varghese, Robin T.
    Villasana, Donna
    Gross, Shannon
    Joshi, Vandita
    Santibanez, Jireh
    Morgan, Margaret
    Chang, Kyle
    Hale, Walker
    Templeton, Alan R.
    Boerwinkle, Eric
    Gibbs, Richard
    Sing, Charles F.
    [J]. NATURE COMMUNICATIONS, 2010, 1