Voting-based Classification for E-mail Spam Detection

被引:11
作者
Al-Shboul, Bashar [1 ]
Hakh, Heba [1 ]
Faris, Hossam [1 ]
Aljarah, Ibrahim [1 ]
Alsawalqah, Hamad [2 ]
机构
[1] Univ Jordan, Dept Business Informat Technol, Queen Rania Al Abdallah St, Amman 11942, Jordan
[2] Univ Jordan, Dept Comp Informat Syst, Queen Rania Al Abdallah St, Amman 11942, Jordan
关键词
e-mail spam detection; feature extraction; multi-classifier voting; voting-based classification;
D O I
10.5614/itbj.ict.res.appl.2016.10.1.3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of spam e-mail has gained a tremendous amount of attention. Although entities tend to use e-mail spam filter applications to filter out received spam e-mails, marketing companies still tend to send unsolicited e-mails in bulk and users still receive a reasonable amount of spam e-mail despite those filtering applications. This work proposes a new method for classifying e-mails into spam and non-spam. First, several e-mail content features are extracted and then those features are used for classifying each e-mail individually. The classification results of three different classifiers (i.e. Decision Trees, Random Forests and k-Nearest Neighbor) are combined in various voting schemes (i.e. majority vote, average probability, product of probabilities, minimum probability and maximum probability) for making the final decision. To validate our method, two different spam e-mail collections were used.
引用
收藏
页码:29 / 42
页数:14
相关论文
共 29 条
[1]  
Al-Jarrah O., 2012, 6 INT C DIG SOC VAL
[2]   On combining classifiers using sum and product rules [J].
Alexandre, LA ;
Campilho, AC ;
Kamel, M .
PATTERN RECOGNITION LETTERS, 2001, 22 (12) :1283-1289
[3]  
Alqatawna J., 2015, INT J COMMUNICATIONS, V08, P118
[4]  
Androutsopoulos I., 2000, P WORKSH MACH LEARN
[5]   A survey of learning-based techniques of email spam filtering [J].
Blanzieri, Enrico ;
Bryl, Anton .
ARTIFICIAL INTELLIGENCE REVIEW, 2008, 29 (01) :63-92
[6]  
CARRERAS X, 2001, P RANLP 01 4 INT C R
[7]  
Chih-Chin Lai, 2004, Fourth International Conference on Hybrid Intelligent Systems, P44, DOI 10.1109/ICHIS.2004.21
[8]  
Clark J, 2003, IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, P702
[9]  
Cormack Gordon V., 2006, Foundations and Trends in Information Retrieval, V1, P1, DOI 10.1561/1500000006
[10]  
Faris H, 2015, 2015 IEEE JORD C APP, P1