Maximum entropy models with inequality constraints: A case study on text categorization

被引:26
作者
Kazama, J
Tsujii, J
机构
[1] Japan Adv Inst Sci & Technol JAIST, Sch Informat Sci, Noumi, Ishikawa 9231292, Japan
[2] Univ Tokyo, Dept Comp Sci, Fac Informat Sci & Technol, Bunkyo Ku, Tokyo 1130033, Japan
[3] JST Japan Sci & Technol Agcy, CREST, Kawaguchi, Saitama 3320012, Japan
关键词
maximum entropy model; inequality constraint; regularization; feature selection; text categorization;
D O I
10.1007/s10994-005-0911-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data sparseness or overfitting is a serious problem in natural language processing employing machine learning methods. This is still true even for the maximum entropy (ME) method, whose flexible modeling capability has alleviated data sparseness more successfully than the other probabilistic models in many NLP tasks. Although we usually estimate the model so that it completely satisfies the equality constraints on feature expectations with the ME method, complete satisfaction leads to undesirable overfitting, especially for sparse features, since the constraints derived from a limited amount of training data are always uncertain. To control overfitting in ME estimation, we propose the use of box-type inequality constraints, where equality can be violated up to certain predefined levels that reflect this uncertainty. The derived models, inequality ME models, in effect have regularized estimation with L-1 norm penalties of bounded parameters. Most importantly, this regularized estimation enables the model parameters to become sparse. This can be thought of as automatic feature selection, which is expected to improve generalization performance further. We evaluate the inequality ME models on text categorization datasets, and demonstrate their advantages over standard ME estimation, similarly motivated Gaussian MAP estimation of ME models, and support vector machines (SVMs), which are one of the state-of-the-art methods for text categorization.
引用
收藏
页码:159 / 194
页数:36
相关论文
共 41 条
[1]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[2]  
BENSON S, 2002, ANLMCSTM242
[3]  
Benson S. J., 2001, ANLMCSP9090901
[4]   The psychology of reactions to environmental agents [J].
Berglund, B ;
Job, RFS .
ENVIRONMENT INTERNATIONAL, 1996, 22 (01) :1-1
[5]  
Bertsekas D.P., 1999, Nonlinear Programming
[6]  
BORTHWICK A.E., 1999, MAXIMUM ENTROPY APPR
[7]   Classification of small B-cell lymphoid neoplasms using a paraffin section immunohistochemical panel [J].
Chen, CC ;
Raikow, RB ;
Sonmez-Alpan, E ;
Swerdlow, SH .
APPLIED IMMUNOHISTOCHEMISTRY & MOLECULAR MORPHOLOGY, 2000, 8 (01) :1-11
[8]  
CHEN SF, 1999, CMUCS99108
[9]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[10]  
Cristianini N., 2000, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods