Spam Email Detection Using Deep Learning Techniques

被引:54
作者
AbdulNabi, Isra'a [1 ]
Yaseen, Qussai [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Irbid 22110, Jordan
来源
12TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 4TH INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS | 2021年 / 184卷
关键词
Cybersecurity; Spam; BERT Transformer; Word embedding; Deep learning;
D O I
10.1016/j.procs.2021.03.107
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Unsolicited emails such as phishing and spam emails cost businesses and individuals millions of dollars annually. Several models and techniques to automatically detect spam emails have been introduced and developed yet non showed 100% predicative accuracy. Among all proposed models both machine and deep learning algorithms achieved more success. Natural language processing (NLP) enhanced the models' accuracy. In this work, the effectiveness of word embedding in classifying spam emails is introduced. Pre-trained transformer model BERT (Bidirectional Encoder Representations from Transformers) is fine-tuned to execute the task of detecting spam emails from non-spam (HAM). BERT uses attention layers to take the context of the text into its perspective. Results are compared to a baseline DNN (deep neural network) model that contains a BiLSTM (bidirectional Long Short Term Memory) layer and two stacked Dense layers. In addition results are compared to a set of classic classifiers k-NN (k-nearest neighbors) and NB (Naive Bayes). Two open-source data sets are used, one to train the model and the other to test the persistence and robustness of the model against unseen data. The proposed approach attained the highest accuracy of 98.67% and 98.66% F1 score. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:853 / 858
页数:6
相关论文
共 30 条
[1]  
Albon C., 2018, Machine learning with Python cookbook: practical solutions from preprocessing to deep learning
[2]  
[Anonymous], 2014, C EMPIRICAL METHODS
[3]  
Awad W. A., 2011, International Journal of Computer Science & Information Technology, V3, P173, DOI 10.5121/ijcsit.2011.3112
[4]  
Baziotis C., 2017, P 11 INT WORKSH SEM, P747, DOI DOI 10.18653/V1/S17-2126
[5]  
Bibi A., 2020, J COMPUT, V15, P73, DOI DOI 10.17706/JCP.15.2.73-84
[6]  
Del Vigna F., 2017, P 1 IT C CYB ITASEC1, P86
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
Dua Dheeru, 2017, UCI machine learning repository
[9]   Phishing Email Detection Using Robust NLP Techniques [J].
Egozi, Gal ;
Verma, Rakesh .
2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, :7-12
[10]   Does Sentiment Analysis Help in Bayesian Spam Filtering? [J].
Ezpeleta, Enaitz ;
Zurutuza, Urko ;
Gomez Hidalgo, Jose Maria .
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2016, 9648 :79-90