Naive Bayes text classifiers: a locally weighted learning approach

被引:74
作者
Jiang, Liangxiao [1 ]
Cai, Zhihua [1 ]
Zhang, Harry [2 ]
Wang, Dianhong [3 ]
机构
[1] China Univ Geosci, Dept Comp Sci, Wuhan 430074, Hubei, Peoples R China
[2] Univ New Brunswick, Fac Comp Sci, Fredericton, NB E3B 5A3, Canada
[3] China Univ Geosci, Dept Elect Engn, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
text classification; naive Bayes; locally weighted learning; multinomial naive Bayes; complement naive Bayes; the one-versus-all-but-one model; OPTIMALITY;
D O I
10.1080/0952813X.2012.721010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to being fast, easy to implement and relatively effective, some state-of-the-art naive Bayes text classifiers with the strong assumption of conditional independence among attributes, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, have received a great deal of attention from researchers in the domain of text classification. In this article, we revisit these naive Bayes text classifiers and empirically compare their classification performance on a large number of widely used text classification benchmark datasets. Then, we propose a locally weighted learning approach to these naive Bayes text classifiers. We call our new approach locally weighted naive Bayes text classifiers (LWNBTC). LWNBTC weakens the attribute conditional independence assumption made by these naive Bayes text classifiers by applying the locally weighted learning approach. The experimental results show that our locally weighted versions significantly outperform these state-of-the-art naive Bayes text classifiers in terms of classification accuracy.
引用
收藏
页码:273 / 286
页数:14
相关论文
共 31 条
[1]  
[Anonymous], 2006, Introduction to Data Mining
[2]  
[Anonymous], 2014, C4. 5: programs for machine learning
[3]  
[Anonymous], 1998, LEARNING TEXT CATEGO
[4]  
[Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH
[5]  
Atkeson CG, 1997, ARTIF INTELL REV, V11, P11, DOI 10.1023/A:1006559212014
[6]  
Berger A, 1999, IJCAI 99 WORKSH MACH
[7]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[8]  
Duda R. O., 1973, Pattern Classification and Scene Analysis, V3
[9]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
[10]  
Frank E., 2003, Proceedings of the Conference on Uncertainty in Artificial Intelligence, P249