Approximate nonparametric maximum likelihood for mixture models: A convex optimization approach to fitting arbitrary multivariate mixing distributions

被引:14
作者
Feng, Long [1 ]
Dicker, Lee H. [1 ]
机构
[1] Rutgers State Univ, Dept Stat & Biostat, New Brunswick, NJ 08901 USA
关键词
Nonparametric maximum likelihood; Kiefer-Wolfowitz estimator; Multivariate mixture models; Convex optimization; HIGH-DIMENSIONAL CLASSIFICATION; EMPIRICAL BAYES ESTIMATION; ESTIMATOR;
D O I
10.1016/j.csda.2018.01.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nonparametric maximum likelihood (NPML) for mixture models is a technique for estimating mixing distributions that has a long and rich history in statistics going back to the 1950s, and is closely related to empirical Bayes methods. Historically, NPML-based methods have been considered to be relatively impractical because of computational and theoretical obstacles. However, recent work focusing on approximate NPML methods suggests that these methods may have great promise for a variety of modern applications. Building on this recent work, a class of flexible, scalable, and easy to implement approximate NPML methods is studied for problems with multivariate mixing distributions. Concrete guidance on implementing these methods is provided, with theoretical and empirical support; topics covered include identifying the support set of the mixing distribution, and comparing algorithms (across a variety of metrics) for solving the simple convex optimization problem at the core of the approximate NPML problem. Additionally, three diverse real data applications are studied to illustrate the methods' performance: (i) A baseball data analysis (a classical example for empirical Bayes methods), (ii) high-dimensional microarray classification, and (iii) online prediction of blood-glucose density for diabetes patients. Among other things, the empirical results demonstrate the relative effectiveness of using multivariate (as opposed to univariate) mixing distributions for. NPML-based approaches. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:80 / 91
页数:12
相关论文
共 35 条
[21]  
Jiang W., 2010, BORROWING STRENGTH T, V6, P263, DOI [DOI 10.1214/10-IMSC0LL618, https://doi.org/10.1214/10-imscoll618]
[22]   GENERAL MAXIMUM LIKELIHOOD EMPIRICAL BAYES ESTIMATION OF NORMAL MEANS [J].
Jiang, Wenhua ;
Zhang, Cun-Hui .
ANNALS OF STATISTICS, 2009, 37 (04) :1647-1684
[23]   CONSISTENCY OF THE MAXIMUM-LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS [J].
KIEFER, J ;
WOLFOWITZ, J .
ANNALS OF MATHEMATICAL STATISTICS, 1956, 27 (04) :887-906
[24]  
Koenker R., 2016, REBayes: An R package for empirical Bayes mixture methods
[25]   Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules [J].
Koenker, Roger ;
Mizera, Ivan .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (506) :674-685
[26]   NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION OF A MIXING DISTRIBUTION [J].
LAIRD, N .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1978, 73 (364) :805-811
[27]  
Lindsay B. G., 1981, STATISTICAL DISTRIBU, V5, P95
[28]  
Lindsay B. G., 1995, Mixture models: theory, geometry, and applications
[29]   A direct approach to sparse discriminant analysis in ultra-high dimensions [J].
Mai, Qing ;
Zou, Hui ;
Yuan, Ming .
BIOMETRIKA, 2012, 99 (01) :29-42
[30]   AN EMPIRICAL BAYES MIXTURE METHOD FOR EFFECT SIZE AND FALSE DISCOVERY RATE ESTIMATION [J].
Muralidharan, Omkar .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :422-438