Inference of Site Frequency Spectra From High-Throughput Sequence Data: Quantification of Selection on Nonsynonymous and Synonymous Sites in Humans

被引:33
作者
Keightley, Peter D. [1 ]
Halligan, Daniel L. [1 ]
机构
[1] Univ Edinburgh, Sch Biol Sci, Inst Evolutionary Biol, Edinburgh EH9 3JT, Midlothian, Scotland
基金
英国生物技术与生命科学研究理事会;
关键词
MUTATIONS; RECOMBINATION; MAMMALS;
D O I
10.1534/genetics.111.128355
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.
引用
收藏
页码:931 / U295
页数:14
相关论文
共 25 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Assessing the evolutionary impact of amino acid mutations in the human genome [J].
Boyko, Adam R. ;
Williamson, Scott H. ;
Indap, Amit R. ;
Degenhardt, Jeremiah D. ;
Hernandez, Ryan D. ;
Lohmueller, Kirk E. ;
Adams, Mark D. ;
Schmidt, Steffen ;
Sninsky, John J. ;
Sunyaev, Shamil R. ;
White, Thomas J. ;
Nielsen, Rasmus ;
Clark, Andrew G. ;
Bustamante, Carlos D. .
PLOS GENETICS, 2008, 4 (05)
[3]   Hearing silence: non-neutral evolution at synonymous sites in mammals [J].
Chamary, JV ;
Parmley, JL ;
Hurst, LD .
NATURE REVIEWS GENETICS, 2006, 7 (02) :98-108
[4]  
CLARK AG, 1992, MOL BIOL EVOL, V9, P744
[5]   Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans [J].
Comeron, JM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (18) :6940-6945
[6]   Distributions of Selectively Constrained Sites and Deleterious Mutation Rates in the Hominid and Murid Genomes [J].
Eory, Lel ;
Halligan, Daniel L. ;
Keightley, Peter D. .
MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (01) :177-192
[7]   The distribution of fitness effects of new deleterious amino acid mutations in humans [J].
Eyre-Walker, Adam ;
Woolfit, Megan ;
Phelps, Ted .
GENETICS, 2006, 173 (02) :891-900
[8]   mlRho - a program for estimating the population mutation and recombination rates from shotgun-sequenced diploid genomes [J].
Haubold, Bernhard ;
Pfaffelhuber, Peter ;
Lynch, Michael .
MOLECULAR ECOLOGY, 2010, 19 :277-284
[9]   Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals [J].
Hellmann, Ines ;
Mang, Yuan ;
Gu, Zhiping ;
Li, Peter ;
de la Vega, Francisco M. ;
Clark, Andrew G. ;
Nielsen, Rasmus .
GENOME RESEARCH, 2008, 18 (07) :1020-1029
[10]   Accounting for bias from sequencing error in population genetic estimates [J].
Johnson, Philip L. F. ;
Slatkin, Montgomery .
MOLECULAR BIOLOGY AND EVOLUTION, 2008, 25 (01) :199-206