DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA

被引:50
作者
Bhaskar, Anand [1 ]
Song, Yun S. [1 ,2 ]
机构
[1] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
Population genetics; identifiability; population size; coalescent theory; frequency spectrum; ALLELE FREQUENCY-SPECTRUM; GENETIC DRIFT; INFERENCE; HISTORY; GROWTH; IMPACT; STRATIFICATION; DISTRIBUTIONS; VARIANTS; ANCESTRY;
D O I
10.1214/14-AOS1264
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.
引用
收藏
页码:2469 / 2493
页数:25
相关论文
共 46 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Assessing the evolutionary impact of amino acid mutations in the human genome [J].
Boyko, Adam R. ;
Williamson, Scott H. ;
Indap, Amit R. ;
Degenhardt, Jeremiah D. ;
Hernandez, Ryan D. ;
Lohmueller, Kirk E. ;
Adams, Mark D. ;
Schmidt, Steffen ;
Sninsky, John J. ;
Sunyaev, Shamil R. ;
White, Thomas J. ;
Nielsen, Rasmus ;
Clark, Andrew G. ;
Bustamante, Carlos D. .
PLOS GENETICS, 2008, 4 (05)
[3]   Demonstrating stratification in a European American population [J].
Campbell, CD ;
Ogburn, EL ;
Lunetta, KL ;
Lyon, HN ;
Freedman, ML ;
Groop, LC ;
Altshuler, D ;
Ardlie, KG ;
Hirschhorn, JN .
NATURE GENETICS, 2005, 37 (08) :868-872
[4]   Deep resequencing reveals excess rare recent variants consistent with explosive population growth [J].
Coventry, Alex ;
Bull-Otterson, Lara M. ;
Liu, Xiaoming ;
Clark, Andrew G. ;
Maxwell, Taylor J. ;
Crosby, Jacy ;
Hixson, James E. ;
Rea, Thomas J. ;
Muzny, Donna M. ;
Lewis, Lora R. ;
Wheeler, David A. ;
Sabo, Aniko ;
Lusk, Christine ;
Weiss, Kenneth G. ;
Akbar, Humeira ;
Cree, Andrew ;
Hawes, Alicia C. ;
Newsham, Irene ;
Varghese, Robin T. ;
Villasana, Donna ;
Gross, Shannon ;
Joshi, Vandita ;
Santibanez, Jireh ;
Morgan, Margaret ;
Chang, Kyle ;
Hale, Walker ;
Templeton, Alan R. ;
Boerwinkle, Eric ;
Gibbs, Richard ;
Sing, Charles F. .
NATURE COMMUNICATIONS, 2010, 1
[5]   Robust Demographic Inference from Genomic and SNP Data [J].
Excoffier, Laurent ;
Dupanloup, Isabelle ;
Huerta-Sanchez, Emilia ;
Sousa, Vitor C. ;
Foll, Matthieu .
PLOS GENETICS, 2013, 9 (10)
[6]   Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants [J].
Fu, Wenqing ;
O'Connor, Timothy D. ;
Jun, Goo ;
Kang, Hyun Min ;
Abecasis, Goncalo ;
Leal, Suzanne M. ;
Gabriel, Stacey ;
Altshuler, David ;
Shendure, Jay ;
Nickerson, Deborah A. ;
Bamshad, Michael J. ;
Akey, Joshua M. .
NATURE, 2013, 493 (7431) :216-220
[7]   STATISTICAL PROPERTIES OF SEGREGATING SITES [J].
FU, YX .
THEORETICAL POPULATION BIOLOGY, 1995, 48 (02) :172-197
[8]  
Gantmacher F., 2000, THEORY MATRICES, V2
[9]   Population Growth Inflates the Per-Individual Number of Deleterious Mutations and Reduces Their Mean Effect [J].
Gazave, Elodie ;
Chang, Diana ;
Clark, Andrew G. ;
Keinan, Alon .
GENETICS, 2013, 195 (03) :969-+
[10]   Demographic history and rare allele sharing among human populations [J].
Gravel, Simon ;
Henn, Brenna M. ;
Gutenkunst, Ryan N. ;
Indap, Amit R. ;
Marth, Gabor T. ;
Clark, Andrew G. ;
Yu, Fuli ;
Gibbs, Richard A. ;
Bustamante, Carlos D. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (29) :11983-11988