DISCRIMINATIVELY WEIGHTED NAIVE BAYES AND ITS APPLICATION IN TEXT CLASSIFICATION

被引:54
作者
Jiang, Liangxiao [1 ]
Wang, Dianghong [2 ]
Cai, Zhihua [1 ]
机构
[1] China Univ Geosci, Dept Comp Sci, Wuhan 430074, Hubei, Peoples R China
[2] China Univ Geosci, Dept Elect Engn, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Naive Bayes; discriminatively weighted naive Bayes; instance weighting; discriminative instance weighting; discriminative learning; ROC CURVE; AREA;
D O I
10.1142/S0218213011004770
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many approaches are proposed to improve naive Bayes by weakening its conditional independence assumption. In this paper, we work on the approach of instance weighting and propose an improved naive Bayes algorithm by discriminative instance weighting. We called it Discriminatively Weighted Naive Bayes. In each iteration of it, different training instances are discriminatively assigned different weights according to the estimated conditional probability loss. The experimental results based on a large number of UCI data sets validate its effectiveness in terms of the classification accuracy and AUC. Besides,the experimental results on the running time show that our Discriminatively Weighted Naive Bayes performs almost as efficiently as the state-of-the-art Discriminative Frequency Estimate learning method, and significantly more efficient than Boosted Naive Bayes. At last, we apply the idea of discriminatively weighted learning in our algorithm to some state-of-the-art naive Bayes text classifiers, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, and have achieved remarkable improvements.
引用
收藏
页数:19
相关论文
共 29 条
  • [1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
  • [2] [Anonymous], CS97557 U CAL
  • [3] [Anonymous], 1998, LEARNING TEXT CATEGO
  • [4] Atkeson CG, 1997, ARTIF INTELL REV, V11, P11, DOI 10.1023/A:1006559212014
  • [5] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [6] Frank A., 2010, UCI machine learning repository, V213
  • [7] Frank E., 2003, Proceedings of the Conference on Uncertainty in Artificial Intelligence, P249
  • [8] Freund Y., 1996, INT C MACH LEARN ICM, V6, P148, DOI DOI 10.5555/3091696.3091715
  • [9] Freund Y., 2005, P 2 EUR C COMP LEARN, P23
  • [10] Bayesian network classifiers
    Friedman, N
    Geiger, D
    Goldszmidt, M
    [J]. MACHINE LEARNING, 1997, 29 (2-3) : 131 - 163