Visual and textual features based email spam classification using S-Cuckoo search and hybrid kernel support vector machine

被引:18
作者
Kumaresan, T. [1 ]
Saravanakumar, S. [2 ]
Balamurugan, R. [3 ]
机构
[1] Bannari Amman Inst Technol, Erode, Tamil Nadu, India
[2] Adithya Inst Technol, Coimbatore, Tamil Nadu, India
[3] Bharat Inst Engn & Technol, Hyderabad, Telangana, India
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2019年 / 22卷 / Suppl 1期
关键词
Support vector machine; Cuckoo search; Spam; Correlogram; S-Cuckoo search; RECOGNITION;
D O I
10.1007/s10586-017-1615-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spam mail classification has been playing a vital role in recent days due to the uncontrollable growth happening in the electronic media. Literature presents several algorithms for email spam classification based on classification methods. In this paper, we propose a spam classification framework using S-Cuckoo and hybrid kernel based support vector machine (HKSVM). At first, the features are extracted from the e-mails based on the text as well as the image. For the textual features, TF-term frequency is used. For the image dependent features, correrlogram and wavelet moment are taken. The hybrid features have then high dimension so the optimum features are identified with the help of hybrid algorithm, called S-Cuckoo search. Then, the classification is done using proposed classifier HKSVM model which is designed based on the hybrid kernel by blending three different kernel functions and then it is used in the SVM classifier. The additional features provided based on image and the modification of SVM classifier provides significant improvement as compared with existing algorithms. The spam classification performance is measured by db1 (combining bare-ling spam and Spam Archive corpus) and db2 (combining lemm-ling spam and Spam Archive corpus). Experimental results show that the proposed spam classification framework has outperformed by having better accuracy of 97.235% when compared with existing approach which is able to achieve only 94.117%.
引用
收藏
页码:33 / 46
页数:14
相关论文
共 65 条
[1]  
Abu-Nimeh S., 2007, P ANTIPHISHING WORKI, P60, DOI DOI 10.1145/1299015.1299021
[2]   Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection [J].
Acir, N ;
Özdamar, Ö ;
Güzelis, C .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2006, 19 (02) :209-218
[3]  
Androutsopoulos I., 2009, P 11 EUR C MACH LEAR, P9
[4]  
Androutsopoulos I., 2004, 2 NCRS
[5]  
[Anonymous], P 16 TEXT RETR C TRE
[6]   Classification of breast cancer histology images using Convolutional Neural Networks [J].
Araujo, Teresa ;
Aresta, Guilherme ;
Castro, Eduardo ;
Rouco, Jose ;
Aguiar, Paulo ;
Eloy, Catarina ;
Polonia, Antonio ;
Campilho, Aurelio .
PLOS ONE, 2017, 12 (06)
[7]  
Bezerra GB, 2006, LECT NOTES COMPUT SC, V4163, P446
[8]  
Biro I., 2008, AIRWeb, P29
[9]  
Bratko A, 2006, J MACH LEARN RES, V7, P2673
[10]  
Brutlag J. D., 2000, P 17 INT C MACH LEAR