An empirical study of three machine learning methods for spam filtering

被引：43

作者：

Lai, Chih-Chin ^{[1
]}

机构：

[1] Natl Univ Tainan, Dept Comp Sci & Informat Engn, Tainan 700, Taiwan

来源：

KNOWLEDGE-BASED SYSTEMS | 2007年 / 20卷 / 03期

关键词：

spam filtering; machine learning;

D O I：

10.1016/j.knosys.2006.05.016

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The increasing volumes of unsolicited bulk e-mail (also known as spam) are bringing more annoyance for most Internet users. Using a classifier based on a specific machine-learning technique to automatically filter out spam e-mail has drawn many researchers' attention. This paper is a comparative study the performance of three commonly used machine learning methods in spam filtering. On the other hand, we try to integrate two spam filtering methods to obtain better performance. A set of systematic experiments has been conducted with these methods which are applied to different parts of an e-mail. Experiments show that using the header only can achieve satisfactory performance, and the idea of integrating disparate methods is a promising way to fight spam. (c) 2006 Elsevier B.V. All rights reserved.

引用

页码：249 / 254

页数：6

共 15 条

[1]

Androutsopoulos I, 2000, P WORKSH MACH LEARN, P9

[2]

ANDROUTSOPOULOS I, 2000, P 23 ANN INT ACM SIG, P160

[3]

[Anonymous], P AAAI SPRING S MACH

[4]

[Anonymous], P 4 INT C REC ADV NA

[5]

[Anonymous], P 4 EUR C PRINC PRAC

[6] Support vector machines for spam categorization [J].

Drucker, H ;

Wu, DH ;

Vapnik, VN .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054

[7]

KOLCZ A, 2001, P TEXTDM 01 WORKSH T

[8]

Sahami Mehran, 1998, Learning for Text Categorization: Papers from the 1998 Workshop, V62, P98

[9]

Sakkis G, 2001, PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P44

[10]

SEGAL R, 2004, P 1 C EM ANT SPAM

← 1 2 →