A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset

被引:17
作者
Abayomi-Alli, Olusola [1 ]
Misra, Sanjay [2 ]
Abayomi-Alli, Adebayo [3 ]
机构
[1] Kaunas Univ Technol, Dept Software Engn, Kaunas, Lithuania
[2] Ostfold Univ Coll, Dept Comp Sci & Commun, Halden, Norway
[3] Fed Univ Agr, Dept Comp Sci, Abeokuta, Nigeria
关键词
algorithms; classification; deep learning; machine learning; short messages; MODEL;
D O I
10.1002/cpe.6989
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
SMS, one of the most popular and fast-growing GSM value-added services worldwide, has attracted unwanted SMS, also known as SMS spam. The effects of SMS spam are significant as it affects both the users and the service providers, causing a massive gap in trust among both parties. This article presents a deep learning model based on BiLSTM. Further, it compares our results with some of the states of the art machine learning (ML) algorithm on two datasets: our newly collected dataset and the popular UCI SMS dataset. This study aims to evaluate the performance of diverse learning models and compare the result of the new dataset expanded (ExAIS_SMS) using the following metrics the true positive (TP), false positive (FP), F-measure, recall, precision, and overall accuracy. The average accuracy for the BiLSTSM model achieved moderately improved results compared to some of the ML classifiers. The experimental results achieved significant improvement from the ground truth results after effective fine-tuning of some of the parameters. The BiLSTM model using the ExAIS_SMS dataset attained an accuracy of 93.4% and 98.6% for UCI datasets. Further comparison of the two datasets on the state-of-the-art ML classifiers gave an accuracy of Naive Bayes, BayesNet, SOM, decision tree, C4.5, J48 is 89.64%, 91.11%, 88.24%, 75.76%, 80.24%, and 79.2% respectively for ExAIS_SMS datasets. In conclusion, our proposed BiLSTM model showed significant improvement over traditional ML classifiers. To further validate the robustness of our model, we applied the UCI datasets, and our results showed optimal performance while classifying SMS spam messages based on some metrics: accuracy, precision, recall, and F-measure.
引用
收藏
页数:15
相关论文
共 61 条
[1]  
Abayomi-Alli O., 2021, EXAIS SMS DATASET
[2]   A review of soft techniques for SMS spam classification: Methods, approaches and applications [J].
Abayomi-Alli, Olusola ;
Misra, Sanjay ;
Abayomi-Alli, Adebayo ;
Odusami, Modupe .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 86 :197-212
[3]  
Adel H., 2018, INT J NEW TECHNOL RE, V4, P94
[4]   Immunocomputing-Based Approach for Optimizing the Topologies of LSTM Networks [J].
Al Bataineh, Ali ;
Kaur, Devinder .
IEEE ACCESS, 2021, 9 :78993-79004
[5]   Dendritic Cell Algorithm for Mobile Phone Spam Filtering [J].
Al-Hasan, Ali A. ;
El-Alfy, El-Sayed M. .
6TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2015), THE 5TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2015), 2015, 52 :244-251
[6]   Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering [J].
Almeida, Tiago A. ;
Silva, Tiago P. ;
Santos, Igor ;
Gomez Hidalgo, Jose M. .
KNOWLEDGE-BASED SYSTEMS, 2016, 108 :25-32
[7]  
Alzayat A, 2019, CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, DOI [10.1145/3290605.3300673, 10.1109/southeastcon42311.2019.9020530]
[8]  
Annareddy Sunil, 2019, 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), P66, DOI 10.1109/I-SMAC47947.2019.9032627
[9]  
[Anonymous], 2014, INT J ADV RES COMPUT
[10]  
[Anonymous], 2014, INT J SCI TECHNOL RE