The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies

被引:70
作者
Barnett, Ian [1 ]
Mukherjee, Rajarshi [2 ]
Lin, Xihong [1 ]
机构
[1] Harvard Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
Correlated test statistics; Detection boundary; Genetic association testing; Higher criticism; Multiple hypothesis testing; Signal detection; GENOME-WIDE ASSOCIATION; BREAST-CANCER; DETECTION BOUNDARY; RARE; REGRESSION; DISCOVERY; VARIANTS; DISEASES; RISK;
D O I
10.1080/01621459.2016.1192039
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is of substantial interest to study the effects of genes, genetic pathways, and networks on the risk of complex diseases. These genetic constructs each contain multiple SNPs, which are often correlated and function jointly, and might be large in number. However, only a sparse subset of SNPs in a genetic construct is generally associated with the disease of interest. In this article, we propose the generalized higher criticism (GHC) to test for the association between an SNP set and a disease outcome. The higher criticism is a test traditionally used in high-dimensional signal detection settings when marginal test statistics are independent and the number of parameters is very large. However, these assumptions do not always hold in genetic association studies, due to linkage disequilibrium among SNPs and the finite number of SNPs in an SNP set in each genetic construct. The proposed GHC overcomes the limitations of the higher criticism by allowing for arbitrary correlation structures among the SNPs in an SNP-set, while performing accurate analytic p-value calculations for any finite number of SNPs in the SNP-set. We obtain the detection boundary of the GHC test. We compared empirically using simulations the power of the GHC method with existing SNP-set tests over a range of genetic regions with varied correlation structures and signal sparsity. We apply the proposed methods to analyze the CGEM breast cancer genome-wide association study. Supplementary materials for this article are available online.
引用
收藏
页码:64 / 76
页数:13
相关论文
共 38 条
[1]   AN INTRODUCTION TO FUNCTIONAL CENTRAL LIMIT-THEOREMS FOR DEPENDENT STOCHASTIC-PROCESSES [J].
ANDREWS, DWK ;
POLLARD, D .
INTERNATIONAL STATISTICAL REVIEW, 1994, 62 (01) :119-132
[2]  
[Anonymous], ARXIV13113190
[3]   GLOBAL TESTING UNDER SPARSE ALTERNATIVES: ANOVA, MULTIPLE COMPARISONS AND THE HIGHER CRITICISM [J].
Arias-Castro, Ery ;
Candes, Emmanuel J. ;
Plan, Yaniv .
ANNALS OF STATISTICS, 2011, 39 (05) :2533-2556
[4]   Analytical p-value calculation for the higher criticism test in finite-d problems [J].
Barnett, Ian J. ;
Lin, Xihong .
BIOMETRIKA, 2014, 101 (04) :964-970
[5]   Integrative genomic approaches identify IKBKE as a breast cancer oncogene [J].
Boehm, Jesse S. ;
Zhao, Jean J. ;
Yao, Jun ;
Kim, So Young ;
Firestein, Ron ;
Dunn, Ian F. ;
Sjostrom, Sarah K. ;
Garraway, Levi A. ;
Weremowicz, Stanislawa ;
Richardson, Andrea L. ;
Greulich, Heidi ;
Stewart, Carly J. ;
Mulvey, Laura A. ;
Shen, Rhine R. ;
Ambrogio, Lauren ;
Hirozane-Kishikawa, Tomoko ;
Hill, David E. ;
Vidal, Marc ;
Meyerson, Matthew ;
Grenier, Jennifer K. ;
Hinkle, Greg ;
Root, David E. ;
Roberts, Thomas M. ;
Lander, Eric S. ;
Polyak, Kornelia ;
Hahn, William C. .
CELL, 2007, 129 (06) :1065-1079
[6]   Sequence Kernel Association Test for Quantitative Traits in Family Samples [J].
Chen, Han ;
Meigs, James B. ;
Dupuis, Josee .
GENETIC EPIDEMIOLOGY, 2013, 37 (02) :196-204
[7]   So many correlated tests, so little time!: Rapid adjustment of P values for multiple correlated tests [J].
Conneely, Karen N. ;
Boehnke, Michael .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (06) :1158-1168
[8]  
Crowder M.J., 1978, J R STAT SOC C APPL, V27, P34, DOI [10.2307/2346223, DOI 10.2307/2346223]
[9]   Higher criticism for detecting sparse heterogeneous mixtures [J].
Donoho, D ;
Jin, JS .
ANNALS OF STATISTICS, 2004, 32 (03) :962-994
[10]  
Doukhan P, 1991, MIXING PROPERTIES EX