A cross-validation based estimation of the proportion of true null hypotheses

被引:16
作者
Celisse, Alain [1 ,3 ]
Robin, Stephane [2 ,3 ]
机构
[1] Univ Lille 1, CNRS, Lab Paul Painleve, UMR 8524, F-59655 Villeneuve Dascq, France
[2] AgroParisTech, INRA, MIA, UMR 518, F-75231 Paris 05, France
[3] Stat Syst Biol Grp, Paris, France
关键词
Multiple testing; False discovery rate; Cross-validation; Density estimation; Histograms; FALSE DISCOVERY RATE; MODEL SELECTION; P-VALUES; CONSISTENCY; CHOICE;
D O I
10.1016/j.jspi.2010.04.014
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In the multiple testing context, a challenging problem is the estimation of the proportion pi(0) of true null hypotheses. A large number of estimators of this quantity rely on identifiability assumptions that either appear to be violated on real data, or can be at least relaxed. The proposed estimator (pi) over cap (0) results from density estimation by histograms, and cross-validation. Several consistency results are derived under independence. A new (plug-in) multiple testing procedure (MTP) is also described, based on the Benjamini and Hochberg procedure (BH-procedure) and the proposed estimator. This procedure is asymptotically optimal, provides the asymptotic desired false discovery rate (FDR) control, and is more powerful than the BH-procedure. The non-asymptotic behavior of (pi) over cap is finally assessed through several simulation experiments. It outperforms numerous existing estimators in usual settings, and remains accurate with "U-shape" densities where other estimators usually fail. It does not exhibit any strong sensitivity to dependence. With m block-structured dependent data, it stays reliable up to a within block correlation rho = 0.5, when m/50 blocks are used. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:3132 / 3147
页数:16
相关论文
共 27 条
[1]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[2]   Risk bounds for model selection via penalization [J].
Barron, A ;
Birgé, L ;
Massart, P .
PROBABILITY THEORY AND RELATED FIELDS, 1999, 113 (03) :301-413
[3]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Adaptive linear step-up procedures that control the false discovery rate [J].
Benjamini, Yoav ;
Krieger, Abba M. ;
Yekutieli, Daniel .
BIOMETRIKA, 2006, 93 (03) :491-507
[6]   A comparative review of estimates of the proportion unchanged genes and the false discovery rate [J].
Broberg, P .
BMC BIOINFORMATICS, 2005, 6 (1)
[7]   Nonparametric density estimation by exact leave-p-out cross-validation [J].
Celisse, Alain ;
Robin, Stephane .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (05) :2350-2368
[8]   Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[9]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[10]   Empirical Bayes analysis of a microarray experiment [J].
Efron, B ;
Tibshirani, R ;
Storey, JD ;
Tusher, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1151-1160