Learning mixture models with the regularized latent maximum entropy principle

被引:3
作者
Wang, SJ [1 ]
Schuurmans, D
Peng, FC
Zhao, YX
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
[2] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[3] Univ Missouri, Dept Comp Engn & Comp Sci, Columbia, MO 65201 USA
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2004年 / 15卷 / 04期
关键词
expectation maximization (EM); iterative scaling; latent variables; maximum entropy; mixture models; regularization;
D O I
10.1109/TNN.2004.828755
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new approach to estimating mixture models based on a recent inference principle we have proposed: the latent maximum entropy principle (LME). LME is different from Jaynes' maximum entropy principle, standard maximum likelihood, and maximum a posteriori probability estimation. We demonstrate the LME principle by deriving new algorithms for mixture model estimation, and show how robust new variants of the expectation maximization (EM) algorithm can be developed. We show that a regularized version of LME (RLME), is effective at estimating mixture models. It generally yields better results than plain LME, which in turn is often better than maximum likelihood and maximum a posterior estimation, particularly when inferring latent variable models from small amounts of data.
引用
收藏
页码:903 / 916
页数:14
相关论文
共 26 条
[1]  
ACKLEY DH, 1985, COGNITIVE SCI, V9, P147
[2]  
[Anonymous], 2000, Bayesian theory
[3]   APPROXIMATION OF DENSITY-FUNCTIONS BY SEQUENCES OF EXPONENTIAL-FAMILIES [J].
BARRON, AR ;
SHEU, CH .
ANNALS OF STATISTICS, 1991, 19 (03) :1347-1369
[4]  
Bertsekas D.P., 1999, Nonlinear Programming
[5]  
Borwein J. M., 2000, CMS BOOKS MATH
[6]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[7]  
Csiszar I, 1996, FUND THEOR, V79, P35
[8]   GENERALIZED ITERATIVE SCALING FOR LOG-LINEAR MODELS [J].
DARROCH, JN ;
RATCLIFF, D .
ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (05) :1470-&
[9]   Inducing features of random fields [J].
DellaPietra, S ;
DellaPietra, V ;
Lafferty, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38