Visual and textual features based email spam classification using S-Cuckoo search and hybrid kernel support vector machine

被引:18
作者
Kumaresan, T. [1 ]
Saravanakumar, S. [2 ]
Balamurugan, R. [3 ]
机构
[1] Bannari Amman Inst Technol, Erode, Tamil Nadu, India
[2] Adithya Inst Technol, Coimbatore, Tamil Nadu, India
[3] Bharat Inst Engn & Technol, Hyderabad, Telangana, India
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2019年 / 22卷 / Suppl 1期
关键词
Support vector machine; Cuckoo search; Spam; Correlogram; S-Cuckoo search; RECOGNITION;
D O I
10.1007/s10586-017-1615-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spam mail classification has been playing a vital role in recent days due to the uncontrollable growth happening in the electronic media. Literature presents several algorithms for email spam classification based on classification methods. In this paper, we propose a spam classification framework using S-Cuckoo and hybrid kernel based support vector machine (HKSVM). At first, the features are extracted from the e-mails based on the text as well as the image. For the textual features, TF-term frequency is used. For the image dependent features, correrlogram and wavelet moment are taken. The hybrid features have then high dimension so the optimum features are identified with the help of hybrid algorithm, called S-Cuckoo search. Then, the classification is done using proposed classifier HKSVM model which is designed based on the hybrid kernel by blending three different kernel functions and then it is used in the SVM classifier. The additional features provided based on image and the modification of SVM classifier provides significant improvement as compared with existing algorithms. The spam classification performance is measured by db1 (combining bare-ling spam and Spam Archive corpus) and db2 (combining lemm-ling spam and Spam Archive corpus). Experimental results show that the proposed spam classification framework has outperformed by having better accuracy of 97.235% when compared with existing approach which is able to achieve only 94.117%.
引用
收藏
页码:33 / 46
页数:14
相关论文
共 65 条
[51]  
Moon J, 2004, LECT NOTES COMPUT SC, V3280, P351
[52]  
Moustakas Evangelos, 2005, P 5 C EM ANT CEAS, P1
[53]  
Peters J, 2017, ADAPT COMPUT MACH LE
[54]   A memory-based approach to anti-spam filtering for mailing lists [J].
Sakkis, G ;
Androutsopoulos, I ;
Paliouras, G ;
Karkaletsis, V ;
Spyropoulos, CD ;
Stamatopoulos, P .
INFORMATION RETRIEVAL, 2003, 6 (01) :49-73
[55]  
Shawe-Taylor J., 2004, Kernel Methods for Pattern Analysis
[56]  
Snyder Joel., NETWORK WORLD
[57]   Cancer recognition with bagged ensembles of support vector machines [J].
Valentini, G ;
Muselli, M ;
Ruffino, F .
NEUROCOMPUTING, 2004, 56 :461-466
[58]  
Vapnik VN, 1998, STAT LEARNING THEORY, V1
[59]  
Wang F, 2006, LECT NOTES COMPUT SC, V4115, P660
[60]  
Wang HB, 2005, LECT NOTES COMPUT SC, V3824, P1147