Constrained domain maximum likelihood estimation for naive Bayes text classification

被引：10

作者：

Andres-Ferrer, Jesus ^{[1
]}

Juan, Alfons ^{[1
]}

机构：

[1] Univ Politecn Valencia, DSIC ITI, E-46071 Valencia, Spain

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2010年 / 13卷 / 02期

关键词：

Maximum likelihood estimation; Naive Bayes; Text classification; Parameter smoothing; Karush-Kuhn-Tucker conditions;

D O I：

10.1007/s10044-009-0149-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The naive Bayes assumption in text classification has the advantage of greatly simplifying maximum likelihood estimation of unknown class-conditional word occurrence probabilities. However, these estimates are usually modified by application of a heuristic parameter smoothing technique to avoid (over-fitted) null estimates. In this work, we advocate the reduction of the parameter domain instead of parameter smoothing. This leads to a constrained domain maximum likelihood estimation problem for which we provide an iterative algorithm that solves it optimally.

引用

页码：189 / 196

页数：8

共 11 条

[1]

[Anonymous], P PRIS 2007 FUNCH PO

[2]

Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441

[3] Landscapes of Naive Bayes classifiers [J].

Hoare, Zoe .

PATTERN ANALYSIS AND APPLICATIONS, 2008, 11 (01) :59-72

[4]

Juan A., 2002, Proceedings of the 2nd Int. Workshop on Pattern Recognition in Information Systems, P200

[5]

Lewis D.D., 1998, LECT NOTES COMPUTER, V1398, P4

[6]

McCallum A., 1998, P AAAI 98 WORKSH LEA, V752, P41

[7]

MCCALLUM A, 2002, IND SECTOR DATA SET

[8]

McCallum A. K., 1998, BOW TOOLKIT STAT LAN

[9]

Rennie J, 2001, ORIGINAL 20 NEWSGROU

[10]

VIDAL E, 2000, 30268 ESPRIT

← 1 2 →