A Spam Filtering Method Based on Multi-Modal Fusion

被引:26
作者
Yang, Hong [1 ,2 ]
Liu, Qihe [1 ,2 ]
Zhou, Shijie [1 ,2 ]
Luo, Yang [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Software Enginerring, Chengdu 610054, Sichuan, Peoples R China
[2] 4,Sect 2,Jianshe North Rd, Chengdu 610054, Sichuan, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 06期
关键词
spam filtering system; multi-modal; MMA-MF; fusion model; LSTM; CNN; CLASSIFICATION;
D O I
10.3390/app9061152
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine them to reduce the recognition rate of the single-modal spam filtering systems, thereby implementing the purpose of evading detection. In view of this situation, a new model called multi-modal architecture based on model fusion (MMA-MF) is proposed, which use a multi-modal fusion method to ensure it could effectively filter spam whether it is hidden in the text or in the image. The model fuses a Convolutional Neural Network (CNN) model and a Long Short-Term Memory (LSTM) model to filter spam. Using the LSTM model and the CNN model to process the text and image parts of an email separately to obtain two classification probability values, then the two classification probability values are incorporated into a fusion model to identify whether the email is spam or not. For the hyperparameters of the MMA-MF model, we use a grid search optimization method to get the most suitable hyperparameters for it, and employ a k-fold cross-validation method to evaluate the performance of this model. Our experimental results show that this model is superior to the traditional spam filtering systems and can achieve accuracies in the range of 92.64-98.48%.
引用
收藏
页数:15
相关论文
共 31 条
[1]  
Abi-Haidar A, 2008, LECT NOTES COMPUT SC, V5132, P36, DOI 10.1007/978-3-540-85072-4_4
[2]  
[Anonymous], 2001, ARXIVCS0109015
[3]  
[Anonymous], LEARNING FAST CLASSI
[4]  
[Anonymous], LEARNING FILTER UNSO
[5]  
[Anonymous], FILTERING IMAGE SPAM
[6]  
[Anonymous], 2018, ARXIV180104354
[7]  
[Anonymous], PROCEEDINGS OF THE 8
[8]  
[Anonymous], ARXIVCS0009009
[9]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[10]  
Bouvrie J., 2006, NOTES CONVOLUTIONAL