FALSE DISCOVERY RATE CONTROL WITH UNKNOWN NULL DISTRIBUTION: IS IT POSSIBLE TO MIMIC THE ORACLE?

被引:9
作者
Roquain, Etienne [1 ,2 ]
Verzelen, Nicolas [3 ]
机构
[1] Univ Paris, Paris, France
[2] Sorbonne Univ, CNRS, Lab Probabilites Stat & Modelisat, Paris, France
[3] Univ Montpellier, Inst Agro, MISTEA, INRAE, Montpellier, France
关键词
Benjamini-Hochberg procedure; false discovery rate; minimax; multiple testing; phase transition; sparsity; null distribution; EMPIRICAL BAYES; GENE-EXPRESSION; MULTIPLE; PROPORTION; HYPOTHESES; COVARIANCE; CHOICE;
D O I
10.1214/21-AOS2141
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Classical multiple testing theory prescribes the null distribution, which is often too stringent an assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the same data set, when possible. We explore this issue in the setting where the null distributions are Gaussian with unknown rescaling parameters (mean and variance) whereas the alternative distributions are let arbitrary. In that case, an oracle procedure is the Benjamini-Hochberg procedure applied with the true (unknown) null distribution and we aim at building a procedure that asymptotically mimics the performances of the oracle (AMO in short). Our main result establishes a phase transition at the sparsity boundary n / log(n): an AMO procedure exists if and only if the number of false nulls is of order less than n / log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. In light of our impossibility results, we also pursue the less stringent aim of building a nonparametric confidence region for the null distribution. From a practical perspective, this provides goodness-of-fit tests for the null distribution and allows to assess the reliability of empirical null procedures via novel diagnostic graphs. Our results are illustrated on numerical experiments and real data sets, as detailed in a companion vignette (Roquain and Verzelen (2021)).
引用
收藏
页码:1095 / 1123
页数:29
相关论文
共 63 条
[1]   Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate [J].
Amar, David ;
Shamir, Ron ;
Yekutieli, Daniel .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (08) :e1005700
[2]   Distribution-free multiple testing [J].
Arias-Castro, Ery ;
Chen, Shiyun .
ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (01) :1983-2001
[3]   The Empirical Distribution of a Large Number of Correlated Normal Variables [J].
Azriel, David ;
Schwartzman, Armin .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (511) :1217-1228
[4]   CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS [J].
Barber, Rina Foygel ;
Candes, Emmanuel J. .
ANNALS OF STATISTICS, 2015, 43 (05) :2055-2085
[5]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project [J].
Birney, Ewan ;
Stamatoyannopoulos, John A. ;
Dutta, Anindya ;
Guigo, Roderic ;
Gingeras, Thomas R. ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Snyder, Michael ;
Dermitzakis, Emmanouil T. ;
Stamatoyannopoulos, John A. ;
Thurman, Robert E. ;
Kuehn, Michael S. ;
Taylor, Christopher M. ;
Neph, Shane ;
Koch, Christoph M. ;
Asthana, Saurabh ;
Malhotra, Ankit ;
Adzhubei, Ivan ;
Greenbaum, Jason A. ;
Andrews, Robert M. ;
Flicek, Paul ;
Boyle, Patrick J. ;
Cao, Hua ;
Carter, Nigel P. ;
Clelland, Gayle K. ;
Davis, Sean ;
Day, Nathan ;
Dhami, Pawandeep ;
Dillon, Shane C. ;
Dorschner, Michael O. ;
Fiegler, Heike ;
Giresi, Paul G. ;
Goldy, Jeff ;
Hawrylycz, Michael ;
Haydock, Andrew ;
Humbert, Richard ;
James, Keith D. ;
Johnson, Brett E. ;
Johnson, Ericka M. ;
Frum, Tristan T. ;
Rosenzweig, Elizabeth R. ;
Karnani, Neerja ;
Lee, Kirsten ;
Lefebvre, Gregory C. ;
Navas, Patrick A. ;
Neri, Fidencio ;
Parker, Stephen C. J. ;
Sabo, Peter J. ;
Sandstrom, Richard ;
Shafer, Anthony .
NATURE, 2007, 447 (7146) :799-816
[8]  
Blanchard G, 2010, J MACH LEARN RES, V11, P2973
[9]   Independent filtering increases detection power for high-throughput experiments [J].
Bourgon, Richard ;
Gentleman, Robert ;
Huber, Wolfgang .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (21) :9546-9551
[10]   Covariate-assisted ranking and screening for large-scale two-sample inference [J].
Cai, T. Tony ;
Sun, Wenguang ;
Wang, Weinan .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2019, 81 (02) :187-234