EGMA: Ensemble Learning-Based Hybrid Model Approach for Spam Detection

被引:0
作者
Bilgen, Yusuf [1 ]
Kaya, Mahmut [2 ]
机构
[1] Siirt Univ, Fac Engn, Dept Comp Engn, TR-56100 Siirt, Turkiye
[2] Firat Univ, Fac Engn, Dept Artificial Intelligence & Data Engn, TR-23119 Elazig, Turkiye
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 21期
关键词
spam classification; ensemble learning; gru; mlp; autoencoder; CLASSIFICATION;
D O I
10.3390/app14219669
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Spam messages have emerged as a significant issue in digital communication, adversely affecting users' mental health, personal safety, and network resources. Traditional spam detection methods often suffer from low detection rates and high false positives, underscoring the need for more effective solutions. This paper proposes the EGMA model, an ensemble learning-based hybrid approach for spam detection in SMS messages, which integrates gated recurrent unit (GRU), multilayer perceptron (MLP), and hybrid autoencoder models utilizing a majority voting algorithm. The EGMA model enhances performance by incorporating additional statistical features extracted from message content and employing text vectorization techniques, such as Term Frequency-Inverse Document Frequency (TF-IDF) and CountVectorizer. The proposed model achieved impressive classification accuracies of 99.28% on the SMS Spam Collection dataset, 99.24% on the Email Spam dataset, 99.00% on the Enron-Spam dataset, 98.71% on the Super SMS dataset, and 95.09% on UtkMl's Twitter Spam dataset. These results demonstrate that the EGMA model outperforms individual models and existing methods in the literature, providing a robust solution for enhancing spam detection performance and effectively mitigating the threats that spam messages pose in digital communication.
引用
收藏
页数:21
相关论文
共 62 条
[1]  
Abiramasundari S, 2021, Annals of the Romanian Society for Cell Biology, P3975
[2]  
Almeida T., 2012, SMS Spam Collection
[3]   Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection [J].
Alshattnawi, Sawsan ;
Shatnawi, Amani ;
AlSobeh, Anas M. R. ;
Magableh, Aws A. .
APPLIED SCIENCES-BASEL, 2024, 14 (06)
[4]  
Alshawi B, 2024, INT J INF TECHNOL SE, V16, P71
[5]  
Anggraini DA, 2024, J COMPUT NETW ARCHIT, V6, P838, DOI 10.47709/cnahpc.v6i2.3875
[6]  
Ayo FE, 2024, Decision Analytics Journal, V10, P100390, DOI [10.1016/j.dajour.2023.100390, 10.1016/j.dajour.2023.100390, DOI 10.1016/J.DAJOUR.2023.100390]
[7]   Development of content-based SMS classification application by using Word2Vec-based feature extraction [J].
Balli, Serkan ;
Karasoy, Onur .
IET SOFTWARE, 2019, 13 (04) :295-304
[8]   Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks [J].
Barushka, Aliaksandr ;
Hajek, Petr .
APPLIED INTELLIGENCE, 2018, 48 (10) :3538-3556
[9]  
Bharathi N., Email Spam Dataset
[10]  
Bhowmick A, 2016, Arxiv, DOI arXiv:1606.01042