Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引：25

作者：

Rao, Sanjeev ^{[1
]}

Verma, Anil Kumar ^{[1
]}

Bhatia, Tarunpreet ^{[1
]}

机构：

[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 217卷

关键词：

Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;

D O I：

10.1016/j.eswa.2023.119594

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.

引用

页数：21

共 50 条

[41] Magnetotelluric Data Inversion Based on Deep Learning With the Self-Attention Mechanism
Xu, Kaijun
Liang, Shuyuan
Lu, Yan
Hu, Zuzhi
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[42] Progressive Hybrid Classifier Ensemble for Imbalanced Data
Yang, Kaixiang
Yu, Zhiwen
Chen, C. L. Philip
Cao, Wenming
Wong, Hau-San
You, Jane
Han, Guoqiang
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (04): : 2464 - 2478
[43] A self-attention network for smoke detection
Jiang, Minghua
Zhao, Yaxin
Yu, Feng
Zhou, Changlong
Peng, Tao
FIRE SAFETY JOURNAL, 2022, 129
[44] EFFECT OF SELF-ATTENTION ON HEARTBEAT DETECTION
HODAPP, V
JOURNAL OF PSYCHOPHYSIOLOGY, 1995, 9 (03) : 280 - 280
[45] TFHSVul: A Fine-Grained Hybrid Semantic Vulnerability Detection Method Based on Self-Attention Mechanism in IoT
Xu, Lijuan
An, Baolong
Li, Xin
Zhao, Dawei
Peng, Haipeng
Song, Weizhao
Tong, Fenghua
Han, Xiaohui
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (01): : 30 - 44
[46] A novel hybrid neural network approach incorporating convolution and LSTM with a self-attention mechanism for web attack detection
Luo, Kangqiang
Chen, Yindong
APPLIED INTELLIGENCE, 2025, 55 (02)
[47] A rolling bearing fault diagnosis method for imbalanced data based on multi-scale self-attention mechanism and novel loss function
Qiang Ruiru
Zhao Xiaoqiang
INSIGHT, 2024, 66 (11) : 690 - 701
[48] Self-Attention based Automated Vulnerability Detection with Effective Data Representation
Wu, Tongshuai
Chen, Liwei
Du, Gewangzi
Zhu, Chenguang
Shi, Gang
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 892 - 899
[49] Deep Hierarchical Ensemble Model for Suicide Detection on Imbalanced Social Media Data
Li, Zepeng
Zhou, Jiawei
An, Zhengyi
Cheng, Wenchuan
Hu, Bin
ENTROPY, 2022, 24 (04)
[50] Automatic Generation of News Commentary on Social Media Based on Self-Attention Mechanism
Qu, Miao
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022

← 1 2 3 4 5 →