Content-Based Spam Filtering

被引:0
作者
Almeida, Tiago A. [1 ]
Yamakami, Akebo [1 ]
机构
[1] Univ Estadual Campinas, UNICAMP, Sch Elect & Comp Engn, BR-13081970 Campinas, SP, Brazil
来源
2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010 | 2010年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The growth of email users has resulted in the dramatic increasing of the spam emails. Helpfully, there are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. However, there are several forms of Naive Bayes filters, something the anti-spam literature does not always acknowledge. In this paper, we discuss seven different versions of Naive Bayes classifiers, and compare them with the well-known Linear Support Vector Machine on six non-encoded datasets. Moreover, we propose a new measurement in order to evaluate the quality of anti-spam classifiers. In this way, we investigate the benefits of using Matthews correlation coefficient as the measure of performance.
引用
收藏
页数:7
相关论文
共 24 条
[1]  
Almeida T., 2009, P 8 IEEE INT C MACH, P1
[2]  
Almeida T. A., 2010, P 23 IEEE INT JOINT, P1
[3]  
ANDROUTSOPOULOS I, 2004, 20042 NAT CTR SCI RE
[4]  
[Anonymous], P SAC 02 17 ACM S AP
[5]  
[Anonymous], 11 C UNC ART INT
[6]  
[Anonymous], P 4 INT C ADV NAT LA
[7]  
[Anonymous], 2001, P WORKSH TEXT MIN TE
[8]  
[Anonymous], 2006, C EM ANT CEAS
[9]  
Bratko A, 2006, J MACH LEARN RES, V7, P2673
[10]   Tightening the net: A review of current and next generation spam filtering tools [J].
Carpinter, James ;
Hunt, Ray .
COMPUTERS & SECURITY, 2006, 25 (08) :566-578