The Optimization of Threshold-Based Naive Bayesian Algorithm

被引:0
作者
Wang Xin [1 ]
Jiang Hua [1 ]
机构
[1] Guilin Univ Elect Technol, Sch Comp Sci & Control, Guilin 541004, Peoples R China
来源
THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING | 2009年
关键词
Naive Bayesian classification; text classification; information filtering; overflow;
D O I
10.1109/WGEC.2009.161
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to realize the text classification and spam filtering, the Naive Bayesian algorithm estimate what class are the text in by basing on some statistical probability values in accordance with the characteristic in straining sample, but it is easy to expose the overflow problem, this article will optimize the algorithm by setting the threshold, the optimization strategy is comparing the times that the probability of each class exceed the threshold and the accumulated probability values at the same times. Compare with the existing method, experimental result show the new method not only can solve the overflow problem, but also improve the classification effect effectively.
引用
收藏
页码:762 / 764
页数:3
相关论文
共 7 条
[1]  
BAOJUNPENG, 2003, J SOFTWARE, V14, P1753
[2]  
DAVIS J, 2006, ACM P 23 INT C MACH, P233
[3]  
Hasnah A. M., 2006, Journal of Computer Sciences, V2, P434, DOI 10.3844/jcssp.2006.434.440
[4]  
LIUJIN, 2005, COMPUTER APPL RES, P85
[5]   Machine learning in automated text categorization [J].
Sebastiani, F .
ACM COMPUTING SURVEYS, 2002, 34 (01) :1-47
[6]  
SHENG G, 2006, ACM T INFORM SYSTEMS, V24
[7]  
ZHANGHUAPING, 2002, CHINESE LEXICAL ANAL