OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION

被引:0
作者
Cao, Hongyuan [1 ]
Chen, Jun [2 ]
Zhang, Xianyang [3 ]
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
[2] Mayo Clin, Dept Quantitat Hlth Sci, Rochester, MN USA
[3] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
基金
美国国家卫生研究院;
关键词
EM algorithm; false discovery rate; isotonic regression; local false discovery rate; multiple testing; Pool-Adjacent-Violators algorithm; INCREASES DETECTION POWER; EMPIRICAL BAYES; HYPOTHESES; LIKELIHOOD;
D O I
10.1214/21-AOS2128
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.
引用
收藏
页码:807 / 857
页数:51
相关论文
共 46 条
[1]   AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION [J].
AYER, M ;
BRUNK, HD ;
EWING, GM ;
REID, WT ;
SILVERMAN, E .
ANNALS OF MATHEMATICAL STATISTICS, 1955, 26 (04) :641-647
[2]   CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS [J].
Barber, Rina Foygel ;
Candes, Emmanuel J. .
ANNALS OF STATISTICS, 2015, 43 (05) :2055-2085
[3]   ISOTONIC REGRESSION PROBLEM AND ITS DUAL [J].
BARLOW, RE ;
BRUNK, HD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1972, 67 (337) :140-&
[4]   Weighted False Discovery Rate Control in Large-Scale Multiple Testing [J].
Basu, Pallavi ;
Cai, T. Tony ;
Das, Kiranmoy ;
Sun, Wenguang .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (523) :1172-1183
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   A direct approach to estimating false discovery rates conditional on covariates [J].
Boca, Simina M. ;
Leek, Jeffrey T. .
PEERJ, 2018, 6
[7]   Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks [J].
Cai, T. Tony ;
Sun, Wenguang .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (488) :1467-1481
[8]   The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing [J].
Cao, Hongyuan ;
Sun, Wenguang ;
Kosorok, Michael R. .
BIOMETRIKA, 2013, 100 (02) :495-502
[9]  
DEB N, 2019, 2 COMPONENT MIXTURE
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38