Constrained domain maximum likelihood estimation for naive Bayes text classification

被引:10
作者
Andres-Ferrer, Jesus [1 ]
Juan, Alfons [1 ]
机构
[1] Univ Politecn Valencia, DSIC ITI, E-46071 Valencia, Spain
关键词
Maximum likelihood estimation; Naive Bayes; Text classification; Parameter smoothing; Karush-Kuhn-Tucker conditions;
D O I
10.1007/s10044-009-0149-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The naive Bayes assumption in text classification has the advantage of greatly simplifying maximum likelihood estimation of unknown class-conditional word occurrence probabilities. However, these estimates are usually modified by application of a heuristic parameter smoothing technique to avoid (over-fitted) null estimates. In this work, we advocate the reduction of the parameter domain instead of parameter smoothing. This leads to a constrained domain maximum likelihood estimation problem for which we provide an iterative algorithm that solves it optimally.
引用
收藏
页码:189 / 196
页数:8
相关论文
共 11 条
[1]  
[Anonymous], P PRIS 2007 FUNCH PO
[2]  
Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441
[3]   Landscapes of Naive Bayes classifiers [J].
Hoare, Zoe .
PATTERN ANALYSIS AND APPLICATIONS, 2008, 11 (01) :59-72
[4]  
Juan A., 2002, Proceedings of the 2nd Int. Workshop on Pattern Recognition in Information Systems, P200
[5]  
Lewis D.D., 1998, LECT NOTES COMPUTER, V1398, P4
[6]  
McCallum A., 1998, P AAAI 98 WORKSH LEA, V752, P41
[7]  
MCCALLUM A, 2002, IND SECTOR DATA SET
[8]  
McCallum A. K., 1998, BOW TOOLKIT STAT LAN
[9]  
Rennie J, 2001, ORIGINAL 20 NEWSGROU
[10]  
VIDAL E, 2000, 30268 ESPRIT