A Minimax Optimal Ridge-Type Set Test for Global Hypothesis With Applications in Whole Genome Sequencing Association Studies

被引:173
作者
Liu, Yaowu [1 ]
Li, Zilin [2 ]
Lin, Xihong [2 ,3 ]
机构
[1] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
[2] Harvard TH Chan Sch Publ Hlth, Dept Biostat, 655 Huntington Ave, Boston, MA 02115 USA
[3] Harvard Univ, Dept Stat, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
F-test; Global hypothesis testing; Robust power; Score test; Signal strength; Whole genome sequencing studies; QUADRATIC-FORMS;
D O I
10.1080/01621459.2020.1831926
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Testing a global hypothesis for a set of variables is a fundamental problem in statistics with a wide range of applications. A few well-known classical tests include the Hotelling's T-2 test, the F-test, and the empirical Bayes based score test. These classical tests, however, are not robust to the signal strength and could have a substantial loss of power when signals are weak or moderate, a situation we commonly encounter in contemporary applications. In this article, we propose a minimax optimal ridge-type set test (MORST), a simple and genericmethod for testing a global hypothesis. The power of MORST is robust and considerably higher than that of the classical tests when the strength of signals is weak or moderate. In the meantime, MORST only requires a slight increase in computation compared to these existing tests, making it applicable to the analysis ofmassive genome-wide data. We also provide the generalizations of MORST that are parallel to the traditionalWald test and Rao's score test in asymptotic settings. Extensive simulations demonstrated the robust power of MORST and that the Type I error of MORST was well controlled. We applied MORST to the analysis of the whole-genome sequencing data from the Atherosclerosis Risk in Communities study, where MORST detected 20%-250% more signal regions than the classical tests. Supplementary materials for this article are available online.
引用
收藏
页码:897 / 908
页数:12
相关论文
共 19 条
[1]   The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies [J].
Barnett, Ian ;
Mukherjee, Rajarshi ;
Lin, Xihong .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (517) :64-76
[2]  
DAVIES R. B., 1980, Journal of the Royal Statistical Society. Series C (Applied Statistics), V29, P323
[3]   Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control [J].
Goeman, Jelle J. ;
Van Houwelingen, Hans C. ;
Finos, Livio .
BIOMETRIKA, 2011, 98 (02) :381-390
[4]   Testing against a high dimensional alternative [J].
Goeman, JJ ;
van de Geer, SA ;
van Houwelingen, HC .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 :477-493
[5]   Saddlepoint approximations for distributions of quadratic forms in normal variables [J].
Kuonen, D .
BIOMETRIKA, 1999, 86 (04) :929-935
[6]   Rare-Variant Association Analysis: Study Designs and Statistical Tests [J].
Lee, Seunggeung ;
Abecasis, Goncalo R. ;
Boehnke, Michael ;
Lin, Xihong .
AMERICAN JOURNAL OF HUMAN GENETICS, 2014, 95 (01) :5-23
[7]   Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale [J].
Li, Xihao ;
Li, Zilin ;
Zhou, Hufeng ;
Gaynor, Sheila M. ;
Liu, Yaowu ;
Chen, Han ;
Sun, Ryan ;
Dey, Rounak ;
Arnett, Donna K. ;
Aslibekyan, Stella ;
Ballantyne, Christie M. ;
Bielak, Lawrence F. ;
Blangero, John ;
Boerwinkle, Eric ;
Bowden, Donald W. ;
Broome, Jai G. ;
Conomos, Matthew P. ;
Correa, Adolfo ;
Cupples, L. Adrienne ;
Curran, Joanne E. ;
Freedman, Barry I. ;
Guo, Xiuqing ;
Hindy, George ;
Irvin, Marguerite R. ;
Kardia, Sharon L. R. ;
Kathiresan, Sekar ;
Khan, Alyna T. ;
Kooperberg, Charles L. ;
Laurie, Cathy C. ;
Liu, X. Shirley ;
Mahaney, Michael C. ;
Manichaikul, Ani W. ;
Martin, Lisa W. ;
Mathias, Rasika A. ;
McGarvey, Stephen T. ;
Mitchell, Braxton D. ;
Montasser, May E. ;
Moore, Jill E. ;
Morrison, Alanna C. ;
O'Connell, Jeffrey R. ;
Palmer, Nicholette D. ;
Pampana, Akhil ;
Peralta, Juan M. ;
Peyser, Patricia A. ;
Psaty, Bruce M. ;
Redline, Susan ;
Rice, Kenneth M. ;
Rich, Stephen S. ;
Smith, Jennifer A. ;
Tiwari, Hemant K. .
NATURE GENETICS, 2020, 52 (09) :969-+
[8]   Variance component testing in generalised linear models with random effects [J].
Lin, XH .
BIOMETRIKA, 1997, 84 (02) :309-326
[9]   A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables [J].
Liu, Huan ;
Tang, Yongqiang ;
Zhang, Hao Helen .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (04) :853-856
[10]   Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures [J].
Liu, Yaowu ;
Xie, Jun .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (529) :393-402