A comparison of new and old algorithms for a mixture estimation problem

被引:31
作者
Helmbold, DP [1 ]
Schapire, RE [1 ]
Singer, Y [1 ]
Warmuth, MK [1 ]
机构
[1] AT&T BELL LABS,MURRAY HILL,NJ 07974
基金
美国国家科学基金会;
关键词
mixture models; maximum likelihood; exponentiated gradient algorithms; EM;
D O I
10.1023/A:1007301011561
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EG(eta)). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMeta which, for eta = 1, gives the usual EM update. Experimentally both the EMeta-update and the EG(eta)-update for eta > 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EG(eta) algorithm.
引用
收藏
页码:97 / 119
页数:23
相关论文
共 17 条
[1]  
[Anonymous], NEUROCOMPUTING ALGOR
[2]  
Cover T. M., 1991, Mathematical Finance, V1, P1, DOI [https://doi.org/10.1111/j.1467-9965.1991.tb00002.x, DOI 10.1111/J.1467-9965.1991.TB00002.X, 10.1111/j.1467-9965.1991.tb00002.x]
[3]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[4]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[5]  
Golub GH, 1989, MATRIX COMPUTATIONS
[6]  
KIVINEN J, 1995, P 8 ANN WORKSH COMP
[7]  
KIVINEN J, 1995, P 27 ANN ACM S THEOR
[8]  
Littlestone N., 1988, Machine Learning, V2, P285, DOI 10.1023/A:1022869011914
[9]  
Luenberger D. G., 2015, Linear and nonlinear programming, V4th
[10]  
MENG XL, 1992, BAYESIAN STAT, V4