Estimation of allele frequency and association mapping using next-generation sequencing data

被引:112
作者
Kim, Su Yeon [1 ,2 ]
Lohmueller, Kirk E. [1 ,2 ]
Albrechtsen, Anders [3 ]
Li, Yingrui [4 ]
Korneliussen, Thorfinn [5 ]
Tian, Geng [4 ,6 ,7 ]
Grarup, Niels [8 ]
Jiang, Tao [4 ]
Andersen, Gitte [9 ]
Witte, Daniel [10 ]
Jorgensen, Torben [11 ]
Hansen, Torben [8 ,12 ]
Pedersen, Oluf [8 ,9 ,13 ,14 ]
Wang, Jun [4 ,5 ]
Nielsen, Rasmus [1 ,2 ,5 ]
机构
[1] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[3] Univ Copenhagen, Bioinformat Ctr, Copenhagen, Denmark
[4] Beijing Genom Inst, Shenzhen 518083, Peoples R China
[5] Univ Copenhagen, Dept Biol, Copenhagen, Denmark
[6] Chinese Acad Sci, Beijing Inst Genom, Beijing 101300, Peoples R China
[7] Chinese Acad Sci, Grad Univ, Beijing 100062, Peoples R China
[8] Univ Copenhagen, Fac Hlth Sci, Novo Nordisk Fdn, Ctr Basic Metab Res, Copenhagen, Denmark
[9] Hagedorn Res Inst, Copenhagen, Denmark
[10] Steno Diabet Ctr, DK-2820 Gentofte, Denmark
[11] Glostrup Univ Hosp, Res Ctr Prevent & Hlth, Glostrup, Denmark
[12] Univ So Denmark, Fac Hlth Sci, Odense, Denmark
[13] Univ Aarhus, Fac Hlth Sci, Aarhus, Denmark
[14] Univ Copenhagen, Inst Biomed Sci, Copenhagen, Denmark
基金
美国国家卫生研究院;
关键词
MAXIMUM-LIKELIHOOD; SNP DISCOVERY; SPECTRUM; HITCHHIKING; POPULATIONS; SELECTION; TOOL;
D O I
10.1186/1471-2105-12-231
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e. g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. Results: We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. Conclusions: Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
引用
收藏
页数:16
相关论文
共 48 条
[1]   Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms [J].
Adams, AM ;
Hudson, RR .
GENETICS, 2004, 168 (03) :1699-1712
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   Rare Variant Association Analysis Methods for Complex Traits [J].
Asimit, Jennifer ;
Zeggini, Eleftheria .
ANNUAL REVIEW OF GENETICS, VOL 44, 2010, 44 :293-308
[4]   Statistical analysis strategies for association studies involving rare variants [J].
Bansal, Vikas ;
Libiger, Ondrej ;
Torkamani, Ali ;
Schork, Nicholas J. .
NATURE REVIEWS GENETICS, 2010, 11 (11) :773-785
[5]   Accurate detection and genotyping of SNPs utilizing population sequencing data [J].
Bansal, Vikas ;
Harismendy, Olivier ;
Tewhey, Ryan ;
Murray, Sarah S. ;
Schork, Nicholas J. ;
Topol, Eric J. ;
Frazer, Kelly A. .
GENOME RESEARCH, 2010, 20 (04) :537-545
[6]   MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads [J].
Bao, Hua ;
Xiong, Yuanyan ;
Guo, Hui ;
Zhou, Renchao ;
Lu, Xuemei ;
Yang, Zhen ;
Zhong, Yang ;
Shi, Suhua .
BMC GENOMICS, 2009, 10
[7]   Assessing the evolutionary impact of amino acid mutations in the human genome [J].
Boyko, Adam R. ;
Williamson, Scott H. ;
Indap, Amit R. ;
Degenhardt, Jeremiah D. ;
Hernandez, Ryan D. ;
Lohmueller, Kirk E. ;
Adams, Mark D. ;
Schmidt, Steffen ;
Sninsky, John J. ;
Sunyaev, Shamil R. ;
White, Thomas J. ;
Nielsen, Rasmus ;
Clark, Andrew G. ;
Bustamante, Carlos D. .
PLOS GENETICS, 2008, 4 (05)
[8]   THE HITCHHIKING EFFECT ON THE SITE FREQUENCY-SPECTRUM OF DNA POLYMORPHISMS [J].
BRAVERMAN, JM ;
HUDSON, RR ;
KAPLAN, NL ;
LANGLEY, CH ;
STEPHAN, W .
GENETICS, 1995, 140 (02) :783-796
[9]  
Broyden C.G., 1970, IMA J APPL MATH, V6, P76, DOI [10.1093/imamat/6.1.76, DOI 10.1093/IMAMAT/6.1.76]
[10]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678