Association Testing for Next-Generation Sequencing Data Using Score Statistics

被引:40
作者
Skotte, Line [1 ]
Korneliussen, Thorfinn Sand [1 ]
Albrechtsen, Anders [1 ]
机构
[1] Univ Copenhagen, Dept Biol, DK-2200 Copenhagen, Denmark
关键词
next-generation sequencing; association; case-control study; quantitative traits; SNPs; GENOME; IMPUTATION; PHASE; MAP;
D O I
10.1002/gepi.21636
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The advances in sequencing technology have made large-scale sequencing studies for large cohorts feasible. Often, the primary goal for large-scale studies is to identify genetic variants associated with a disease or other phenotypes. Even when deep sequencing is performed, there will be many sites where there is not enough data to call genotypes accurately. Ignoring the genotype classification uncertainty by basing subsequent analyses on called genotypes leads to a loss in power. Additionally, using called genotypes can lead to spurious association signals. Some methods taking the uncertainty of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies as well as mapping of quantitative traits. The model allows additional covariates that enable correction for confounding factors such as population stratification or cohort effects. Genet. Epidemiol. 36:430-437, 2012. (C) 2012 Wiley Periodicals, Inc.
引用
收藏
页码:430 / 437
页数:8
相关论文
共 16 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Missing data imputation and haplotype phase inference for genome-wide association studies [J].
Browning, Sharon R. .
HUMAN GENETICS, 2008, 124 (05) :439-450
[3]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[4]   Confounded by Sequencing Depth in Association Studies of Rare Alleles [J].
Garner, Chad .
GENETIC EPIDEMIOLOGY, 2011, 35 (04) :261-268
[5]   A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies [J].
Howie, Bryan N. ;
Donnelly, Peter ;
Marchini, Jonathan .
PLOS GENETICS, 2009, 5 (06)
[6]   Estimation of allele frequency and association mapping using next-generation sequencing data [J].
Kim, Su Yeon ;
Lohmueller, Kirk E. ;
Albrechtsen, Anders ;
Li, Yingrui ;
Korneliussen, Thorfinn ;
Tian, Geng ;
Grarup, Niels ;
Jiang, Tao ;
Andersen, Gitte ;
Witte, Daniel ;
Jorgensen, Torben ;
Hansen, Torben ;
Pedersen, Oluf ;
Wang, Jun ;
Nielsen, Rasmus .
BMC BIOINFORMATICS, 2011, 12
[7]   Design of Association Studies with Pooled or Un-pooled Next-Generation Sequencing Data [J].
Kim, Su Yeon ;
Li, Yingrui ;
Guo, Yiran ;
Li, Ruiqiang ;
Holmkvist, Johan ;
Hansen, Torben ;
Pedersen, Oluf ;
Wang, Jun ;
Nielsen, Rasmus .
GENETIC EPIDEMIOLOGY, 2010, 34 (05) :479-491
[8]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[9]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[10]   SNP detection for massively parallel whole-genome resequencing [J].
Li, Ruiqiang ;
Li, Yingrui ;
Fang, Xiaodong ;
Yang, Huanming ;
Wang, Jian ;
Kristiansen, Karsten ;
Wang, Jun .
GENOME RESEARCH, 2009, 19 (06) :1124-1132