An empirical study of three machine learning methods for spam filtering

被引:43
作者
Lai, Chih-Chin [1 ]
机构
[1] Natl Univ Tainan, Dept Comp Sci & Informat Engn, Tainan 700, Taiwan
关键词
spam filtering; machine learning;
D O I
10.1016/j.knosys.2006.05.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increasing volumes of unsolicited bulk e-mail (also known as spam) are bringing more annoyance for most Internet users. Using a classifier based on a specific machine-learning technique to automatically filter out spam e-mail has drawn many researchers' attention. This paper is a comparative study the performance of three commonly used machine learning methods in spam filtering. On the other hand, we try to integrate two spam filtering methods to obtain better performance. A set of systematic experiments has been conducted with these methods which are applied to different parts of an e-mail. Experiments show that using the header only can achieve satisfactory performance, and the idea of integrating disparate methods is a promising way to fight spam. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:249 / 254
页数:6
相关论文
共 15 条
[1]  
Androutsopoulos I, 2000, P WORKSH MACH LEARN, P9
[2]  
ANDROUTSOPOULOS I, 2000, P 23 ANN INT ACM SIG, P160
[3]  
[Anonymous], P AAAI SPRING S MACH
[4]  
[Anonymous], P 4 INT C REC ADV NA
[5]  
[Anonymous], P 4 EUR C PRINC PRAC
[6]   Support vector machines for spam categorization [J].
Drucker, H ;
Wu, DH ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054
[7]  
KOLCZ A, 2001, P TEXTDM 01 WORKSH T
[8]  
Sahami Mehran, 1998, Learning for Text Categorization: Papers from the 1998 Workshop, V62, P98
[9]  
Sakkis G, 2001, PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P44
[10]  
SEGAL R, 2004, P 1 C EM ANT SPAM