Exploiting Linkage Disequilibrium for Ultrahigh-Dimensional Genome-Wide Data with an Integrated Statistical Approach

被引:3
作者
Carlsen, Michelle [1 ]
Fu, Guifang [1 ]
Bushman, Shaun [2 ]
Corcoran, Christopher [1 ]
机构
[1] Utah State Univ, Dept Math & Stat, 3900 Old Main Hill, Logan, UT 84322 USA
[2] ARS, Forage & Range Res Lab, USDA, Logan, UT 84322 USA
基金
美国国家科学基金会;
关键词
GWAS; linkage disequilibrium; feature screening; large-scale modeling; case-control; genomic selection; GenPred; shared data resource; RIDGE-REGRESSION; HAPLOTYPE DIVERSITY; VARIABLE SELECTION; CANDIDATE GENE; ASSOCIATION; REVEALS; BLOCKS; CANCER; REGION; RISK;
D O I
10.1534/genetics.115.179507
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genome-wide data with millions of single-nucleotide polymorphisms (SNPs) can be highly correlated due to linkage disequilibrium (LD). The ultrahigh dimensionality of big data brings unprecedented challenges to statistical modeling such as noise accumulation, the curse of dimensionality, computational burden, spurious correlations, and a processing and storing bottleneck. The traditional statistical approaches lose their power due to p >> n (n is the number of observations and p is the number of SNPs) and the complex correlation structure among SNPs. In this article, we propose an integrated distance correlation ridge regression (DCRR) approach to accommodate the ultrahigh dimensionality, joint polygenic effects of multiple loci, and the complex LD structures. Initially, a distance correlation (DC) screening approach is used to extensively remove noise, after which LD structure is addressed using a ridge penalized multiple logistic regression (LRR) model. The false discovery rate, true positive discovery rate, and computational cost were simultaneously assessed through a large number of simulations. A binary trait of Arabidopsis thaliana, the hypersensitive response to the bacterial elicitor AvrRpm1, was analyzed in 84 inbred lines (28 susceptibilities and 56 resistances) with 216,130 SNPs. Compared to previous SNP discovery methods implemented on the same data set, the DCRR approach successfully detected the causative SNP while dramatically reducing spurious associations and computational time.
引用
收藏
页码:411 / 426
页数:16
相关论文
共 87 条
[11]   SAMPLE SIZES REQUIRED TO DETECT LINKAGE DISEQUILIBRIUM BETWEEN 2 OR 3 LOCI [J].
BROWN, AHD .
THEORETICAL POPULATION BIOLOGY, 1975, 8 (02) :184-201
[12]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[13]   Association study designs for complex diseases [J].
Cardon, LR ;
Bell, JI .
NATURE REVIEWS GENETICS, 2001, 2 (02) :91-99
[14]   Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes [J].
Chen, Rui ;
Mias, George I. ;
Li-Pook-Than, Jennifer ;
Jiang, Lihua ;
Lam, Hugo Y. K. ;
Chen, Rong ;
Miriami, Elana ;
Karczewski, Konrad J. ;
Hariharan, Manoj ;
Dewey, Frederick E. ;
Cheng, Yong ;
Clark, Michael J. ;
Im, Hogune ;
Habegger, Lukas ;
Balasubramanian, Suganthi ;
O'Huallachain, Maeve ;
Dudley, Joel T. ;
Hillenmeyer, Sara ;
Haraksingh, Rajini ;
Sharon, Donald ;
Euskirchen, Ghia ;
Lacroute, Phil ;
Bettinger, Keith ;
Boyle, Alan P. ;
Kasowski, Maya ;
Grubert, Fabian ;
Seki, Scott ;
Garcia, Marco ;
Whirl-Carrillo, Michelle ;
Gallardo, Mercedes ;
Blasco, Maria A. ;
Greenberg, Peter L. ;
Snyder, Phyllis ;
Klein, Teri E. ;
Altman, Russ B. ;
Butte, Atul J. ;
Ashley, Euan A. ;
Gerstein, Mark ;
Nadeau, Kari C. ;
Tang, Hua ;
Snyder, Michael .
CELL, 2012, 148 (06) :1293-1307
[15]   Multiple rare Alleles contribute to low plasma levels of HDL cholesterol [J].
Cohen, JC ;
Kiss, RS ;
Pertsemlidis, A ;
Marcel, YL ;
McPherson, R ;
Hobbs, HH .
SCIENCE, 2004, 305 (5685) :869-872
[16]   Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations [J].
Crawford, DC ;
Carlson, CS ;
Rieder, MJ ;
Carrington, DP ;
Yi, Q ;
Smith, JD ;
Eberle, MA ;
Kruglyak, L ;
Nickerson, DA .
AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (04) :610-622
[17]   Significance testing in ridge regression for genetic data [J].
Cule, Erika ;
Vineis, Paolo ;
De Iorio, Maria .
BMC BIOINFORMATICS, 2011, 12
[18]   A first-generation linkage disequilibrium map of human chromosome 22 [J].
Dawson, E ;
Abecasis, GR ;
Bumpstead, S ;
Chen, Y ;
Hunt, S ;
Beare, DM ;
Pabial, J ;
Dibling, T ;
Tinsley, E ;
Kirby, S ;
Carter, D ;
Papaspyridonos, M ;
Livingstone, S ;
Ganske, R ;
Lohmmussaar, E ;
Zernant, J ;
Tonisson, N ;
Remm, M ;
Mägi, R ;
Puurand, T ;
Vilo, J ;
Kurg, A ;
Rice, K ;
Deloukas, P ;
Mott, R ;
Metspalu, A ;
Bentley, DR ;
Cardon, LR ;
Dunham, I .
NATURE, 2002, 418 (6897) :544-548
[19]   A COMPARISON OF LINKAGE DISEQUILIBRIUM MEASURES FOR FINE-SCALE MAPPING [J].
DEVLIN, B ;
RISCH, N .
GENOMICS, 1995, 29 (02) :311-322
[20]   Genetic susceptibility to cancer - The role of polymorphisms in candidate genes [J].
Dong, Linda M. ;
Potter, John D. ;
White, Emily ;
Ulrich, Cornelia M. ;
Cardon, Lon R. ;
Peters, Ulrike .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2008, 299 (20) :2423-2436