Voting-based Classification for E-mail Spam Detection

被引:11
作者
Al-Shboul, Bashar [1 ]
Hakh, Heba [1 ]
Faris, Hossam [1 ]
Aljarah, Ibrahim [1 ]
Alsawalqah, Hamad [2 ]
机构
[1] Univ Jordan, Dept Business Informat Technol, Queen Rania Al Abdallah St, Amman 11942, Jordan
[2] Univ Jordan, Dept Comp Informat Syst, Queen Rania Al Abdallah St, Amman 11942, Jordan
关键词
e-mail spam detection; feature extraction; multi-classifier voting; voting-based classification;
D O I
10.5614/itbj.ict.res.appl.2016.10.1.3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of spam e-mail has gained a tremendous amount of attention. Although entities tend to use e-mail spam filter applications to filter out received spam e-mails, marketing companies still tend to send unsolicited e-mails in bulk and users still receive a reasonable amount of spam e-mail despite those filtering applications. This work proposes a new method for classifying e-mails into spam and non-spam. First, several e-mail content features are extracted and then those features are used for classifying each e-mail individually. The classification results of three different classifiers (i.e. Decision Trees, Random Forests and k-Nearest Neighbor) are combined in various voting schemes (i.e. majority vote, average probability, product of probabilities, minimum probability and maximum probability) for making the final decision. To validate our method, two different spam e-mail collections were used.
引用
收藏
页码:29 / 42
页数:14
相关论文
共 29 条
[11]  
Fawcett T., 2003, ACM SIGKDD EXPLORATI, V5, P140, DOI DOI 10.1145/980972.980990
[12]   A review of machine learning approaches to Spam filtering [J].
Guzella, Thiago S. ;
Caminhas, Walmir M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) :10206-10222
[13]  
HAND D, 2001, ADAP COMP MACH LEARN, P1
[14]   Hybrid email spam detection model with negative selection algorithm and differential evolution [J].
Idris, Ismaila ;
Selamat, Ali ;
Omatu, Sigeru .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 28 :97-110
[15]   On combining classifiers [J].
Kittler, J ;
Hatef, M ;
Duin, RPW ;
Matas, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) :226-239
[16]  
Kolcz A., 2001, P WORKSH TEXT MIN TE
[17]   Learning to classify e-mail [J].
Koprinska, Irena ;
Poon, Josiah ;
Clark, James ;
Chan, Jason .
INFORMATION SCIENCES, 2007, 177 (10) :2167-2187
[18]  
Kotsiantis SB, 2007, INFORM-J COMPUT INFO, V31, P249
[19]  
Kuhn M., 2013, APPL PREDICT MODEL, V26, DOI DOI 10.1007/978-1-4614-6849-3
[20]  
Kumar R.K., 2012, P INT MULTICONFERENC, P14