Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data

被引:56
作者
Bhaskar, Anand [1 ,2 ]
Wang, Y. X. Rachel [3 ]
Song, Yun S. [1 ,2 ,3 ,4 ]
机构
[1] Simons Inst Theory Comp, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[4] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
基金
日本学术振兴会;
关键词
ALLELE FREQUENCY-SPECTRUM; DEMOGRAPHIC HISTORY; GROWTH; MODELS; DISTRIBUTIONS; VARIANTS; NUMBER; EXCESS; IMPACT; SCALE;
D O I
10.1101/gr.178756.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions.
引用
收藏
页码:268 / 279
页数:12
相关论文
共 47 条
[1]  
[Anonymous], 2004, Mathematical Population Genetics 1: Theoretical Introduction
[2]  
Balding DJ, 1997, HEREDITY, V78, P583
[3]   DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA [J].
Bhaskar, Anand ;
Song, Yun S. .
ANNALS OF STATISTICS, 2014, 42 (06) :2469-2493
[4]   Assessing the evolutionary impact of amino acid mutations in the human genome [J].
Boyko, Adam R. ;
Williamson, Scott H. ;
Indap, Amit R. ;
Degenhardt, Jeremiah D. ;
Hernandez, Ryan D. ;
Lohmueller, Kirk E. ;
Adams, Mark D. ;
Schmidt, Steffen ;
Sninsky, John J. ;
Sunyaev, Shamil R. ;
White, Thomas J. ;
Nielsen, Rasmus ;
Clark, Andrew G. ;
Bustamante, Carlos D. .
PLOS GENETICS, 2008, 4 (05)
[5]   Demonstrating stratification in a European American population [J].
Campbell, CD ;
Ogburn, EL ;
Lunetta, KL ;
Lyon, HN ;
Freedman, ML ;
Groop, LC ;
Altshuler, D ;
Ardlie, KG ;
Hirschhorn, JN .
NATURE GENETICS, 2005, 37 (08) :868-872
[6]   The joint allele frequency spectrum of multiple populations: A coalescent theory approach [J].
Chen, Hua .
THEORETICAL POPULATION BIOLOGY, 2012, 81 (02) :179-195
[7]   Population structure, differential bias and genomic control in a large-scale, case-control association study [J].
Clayton, DG ;
Walker, NM ;
Smyth, DJ ;
Pask, R ;
Cooper, JD ;
Maier, LM ;
Smink, LJ ;
Lam, AC ;
Ovington, NR ;
Stevens, HE ;
Nutland, S ;
Howson, JMM ;
Faham, M ;
Moorhead, M ;
Jones, HB ;
Falkowski, M ;
Hardenbol, P ;
Willis, TD ;
Todd, JA .
NATURE GENETICS, 2005, 37 (11) :1243-1246
[8]   Variation in genome-wide mutation rates within and between human families [J].
Conrad, Donald F. ;
Keebler, Jonathan E. M. ;
DePristo, Mark A. ;
Lindsay, Sarah J. ;
Zhang, Yujun ;
Casals, Ferran ;
Idaghdour, Youssef ;
Hartl, Chris L. ;
Torroja, Carlos ;
Garimella, Kiran V. ;
Zilversmit, Martine ;
Cartwright, Reed ;
Rouleau, Guy A. ;
Daly, Mark ;
Stone, Eric A. ;
Hurles, Matthew E. ;
Awadalla, Philip .
NATURE GENETICS, 2011, 43 (07) :712-U137
[9]   Deep resequencing reveals excess rare recent variants consistent with explosive population growth [J].
Coventry, Alex ;
Bull-Otterson, Lara M. ;
Liu, Xiaoming ;
Clark, Andrew G. ;
Maxwell, Taylor J. ;
Crosby, Jacy ;
Hixson, James E. ;
Rea, Thomas J. ;
Muzny, Donna M. ;
Lewis, Lora R. ;
Wheeler, David A. ;
Sabo, Aniko ;
Lusk, Christine ;
Weiss, Kenneth G. ;
Akbar, Humeira ;
Cree, Andrew ;
Hawes, Alicia C. ;
Newsham, Irene ;
Varghese, Robin T. ;
Villasana, Donna ;
Gross, Shannon ;
Joshi, Vandita ;
Santibanez, Jireh ;
Morgan, Margaret ;
Chang, Kyle ;
Hale, Walker ;
Templeton, Alan R. ;
Boerwinkle, Eric ;
Gibbs, Richard ;
Sing, Charles F. .
NATURE COMMUNICATIONS, 2010, 1
[10]  
Efrom B., 1986, Statistical Science, V1, P54, DOI DOI 10.1214/SS/1177013815