Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引：25

作者：

Rao, Sanjeev ^{[1
]}

Verma, Anil Kumar ^{[1
]}

Bhatia, Tarunpreet ^{[1
]}

机构：

[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 217卷

关键词：

Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;

D O I：

10.1016/j.eswa.2023.119594

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.

引用

页数：21

共 50 条

[31] LATTE: LSTM Self-Attention based Anomaly Detection in Embedded Automotive Platforms
Kukkala, Vipin Kumar
Thiruloga, Sooryaa Vignesh
Pasricha, Sudeep
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
[32] The hybrid framework of ensemble technique in machine learning for phishing detection
Mahajan, Akanksha S.
Navale, Pradnya K.
Patil, Vaishnavi V.
Khadse, Vijay M.
Mahalle, Parikshit N.
INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTER SECURITY, 2023, 21 (1-2) : 162 - 184
[33] A real-time framework for opinion spam detection in Arabic social networks
Ezzat, Cherry A.
Alkadri, Abdullah M.
Elkorany, Abeer
EGYPTIAN INFORMATICS JOURNAL, 2025, 29
[34] Multi-head self-attention mechanism-based global feature learning model for ASD diagnosis
Zhao, Feng
Feng, Fan
Ye, Shixin
Mao, Yanyan
Chen, Xiaobo
Li, Yuan
Ning, Mao
Zhang, Mingli
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 91
[35] Identification of the aging state of lithium-ion batteries via temporal convolution network and self-attention mechanism
Ke, Leisi
Fang, Linlin
Meng, Jinhao
Peng, Jichang
Wu, Ji
Lin, Mingqiang
Stroe, Daniel-Ioan
JOURNAL OF ENERGY STORAGE, 2024, 84
[36] Blood Pressure Estimation Using Self-Attention Mechanism Built-In ResUNet on PulseDB: Demographic Fairness and Generalization
Jamil, Zainab
Lui, Leong Ting
Chan, Rosa H. M.
IEEE SENSORS JOURNAL, 2025, 25 (01) : 1694 - 1705
[37] Self-Attention based fine-grained cross-media hybrid network
Shan, Wei
Huang, Dan
Wang, Jiangtao
Zou, Feng
Li, Suwen
PATTERN RECOGNITION, 2022, 130
[38] Hybrid Deep Learning Based Attack Detection for Imbalanced Data Classification
Almarshdi, Rasha
Nassef, Laila
Fadel, Etimad
Alowidi, Nahed
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (01) : 297 - 320
[39] Spam review detection using self attention based CNN and bi-directional LSTM
Bhuvaneshwari, P.
Rao, A. Nagaraja
Robinson, Y. Harold
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (12) : 18107 - 18124
[40] Enhancement of spam detection mechanism based on hybrid -mean clustering and support vector machine
Elssied, Nadir Omer Fadl
Ibrahim, Othman
Osman, Ahmed Hamza
SOFT COMPUTING, 2015, 19 (11) : 3237 - 3248

← 1 2 3 4 5 →