NONPARAMETRIC ESTIMATION OF THE DENSITY OF THE ALTERNATIVE HYPOTHESIS IN A MULTIPLE TESTING SETUP. APPLICATION TO LOCAL FALSE DISCOVERY RATE ESTIMATION

被引:3
作者
Van Hanh Nguyen [1 ,2 ]
Matias, Catherine [2 ]
机构
[1] Univ Paris 11, Lab Math Orsay, UMR CNRS 8628, F-91405 Orsay, France
[2] Univ Evry Val Essonne, Lab Stat & Genome, UMR CNRS 8071, USC INRA, F-91037 Evry, France
关键词
False discovery rate; kernel estimation; local false discovery rate; maximum smoothed likelihood; multiple testing; p-values; semiparametric mixture model; MAXIMUM SMOOTHED LIKELIHOOD; TRUE NULL HYPOTHESES; GENE-EXPRESSION; MIXTURE MODEL; EM ALGORITHM; P-VALUES; MICROARRAY; PROPORTION;
D O I
10.1051/ps/2013041
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of p-values under the null hypothesis and the other component f is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of the unknown proportion. of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Holder densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data.
引用
收藏
页码:584 / 612
页数:29
相关论文
共 27 条
[1]   A mixture model approach for the analysis of microarray gene expression data [J].
Allison, DB ;
Gadbury, GL ;
Heo, MS ;
Fernández, JR ;
Lee, CK ;
Prolla, TA ;
Weindruch, R .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 39 (01) :1-20
[2]  
[Anonymous], MONOGR STAT APPL PRO
[3]   Determination of the differentially expressed genes in microarray experiments using local FDR [J].
Aubert, J ;
Bar-Hen, A ;
Daudin, JJ ;
Robin, S .
BMC BIOINFORMATICS, 2004, 5 (1)
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   A cross-validation based estimation of the proportion of true null hypotheses [J].
Celisse, Alain ;
Robin, Stephane .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) :3132-3147
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]   Empirical Bayes analysis of a microarray experiment [J].
Efron, B ;
Tibshirani, R ;
Storey, JD ;
Tusher, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1151-1160
[8]  
Eggermont P., 2001, SPRINGER SER STAT, V1
[9]   MAXIMUM SMOOTHED LIKELIHOOD DENSITY-ESTIMATION FOR INVERSE PROBLEMS [J].
EGGERMONT, PPB ;
LARICCIA, VN .
ANNALS OF STATISTICS, 1995, 23 (01) :199-220
[10]   Nonlinear smoothing and the EM algorithm for positive integral equations of the first kind [J].
Eggermont, PPB .
APPLIED MATHEMATICS AND OPTIMIZATION, 1999, 39 (01) :75-91