A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages

被引:53
作者
Ghourabi, Abdallah [1 ,2 ]
Mahmood, Mahmood A. [1 ,3 ]
Alzubi, Qusay M. [1 ]
机构
[1] Jouf Univ, Dept Comp Sci, Tabarjal 74728, Saudi Arabia
[2] Univ Sousse, Higher Sch Sci & Technol Hammam Sousse, Hammam Sousse 4011, Tunisia
[3] Cairo Univ, Dept Informat & Technol Syst, Giza 12613, Egypt
关键词
SMS spam detection; deep learning; CNN; LSTM; SMS Classification; SMISHING MESSAGES; LANGUAGE MODELS; SECURITY MODEL; RECOGNITION; MACHINE;
D O I
10.3390/fi12090156
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite the rapid evolution of Internet protocol-based messaging services, SMS still remains an indisputable communication service in our lives until today. For example, several businesses consider that text messages are more effective than e-mails. This is because 82% of SMSs are read within 5 min., but consumers only open one in four e-mails they receive. The importance of SMS for mobile phone users has attracted the attention of spammers. In fact, the volume of SMS spam has increased considerably in recent years with the emergence of new security threats, such as SMiShing. In this paper, we propose a hybrid deep learning model for detecting SMS spam messages. This detection model is based on the combination of two deep learning methods CNN and LSTM. It is intended to deal with mixed text messages that are written in Arabic or English. For the comparative evaluation, we also tested other well-known machine learning algorithms. The experimental results that we present in this paper show that our CNN-LSTM model outperforms the other algorithms. It achieved a very good accuracy of 98.37%.
引用
收藏
页数:16
相关论文
共 32 条
[1]  
Agarwal S, 2015, 2015 1ST INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING TECHNOLOGIES (NGCT), P634, DOI 10.1109/NGCT.2015.7375198
[2]   AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language [J].
Al-Sallab, Ahmad ;
Baly, Ramy ;
Hajj, Hazem ;
Shaban, Khaled Bashir ;
El-Hajj, Wassim ;
Badaro, Gilbert .
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 16 (04)
[3]   Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews [J].
Al-Smadi, Mohammad ;
Qawasmeh, Omar ;
Al-Ayyoub, Mahmoud ;
Jararweh, Yaser ;
Gupta, Brij .
JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 27 :386-393
[4]   Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering [J].
Almeida, Tiago A. ;
Silva, Tiago P. ;
Santos, Igor ;
Gomez Hidalgo, Jose M. .
KNOWLEDGE-BASED SYSTEMS, 2016, 108 :25-32
[5]  
Almeida TA, 2011, DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, P259
[6]  
[Anonymous], 2011, TEXT MESS SPAM INF
[7]  
[Anonymous], 2009, P 4 WORKSH STAT MACH
[8]  
[Anonymous], 2012, GUIDE OCR ARABIC SCR
[9]  
[Anonymous], 2013, INT C LEARNING REPRE
[10]  
Arifin DD, 2016, 2016 IEEE ASIA PACIFIC CONFERENCE ON WIRELESS AND MOBILE (APWIMOB), P80, DOI 10.1109/APWiMob.2016.7811442