Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引：25

作者：

Rao, Sanjeev ^{[1
]}

Verma, Anil Kumar ^{[1
]}

Bhatia, Tarunpreet ^{[1
]}

机构：

[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 217卷

关键词：

Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;

D O I：

10.1016/j.eswa.2023.119594

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.

引用

页数：21

共 50 条

[1] A hybrid spam detection framework for social networks
Citlak, Oguzhan
Dorterler, Murat
Dogru, Ibrahim Alper
JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (02): : 823 - 837
[2] Machine Learning Based Anomaly Detection of Log Files Using Ensemble Learning and Self-Attention
Falt, Markus
Forsstrom, Stefan
Zhang, Tingting
2021 5TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2021), 2021, : 209 - 215
[3] A novel ensemble system for short-term wind speed forecasting based on hybrid decomposition approach and artificial intelligence models optimized by self-attention mechanism
Pang, Junheng
Dong, Sheng
ENERGY CONVERSION AND MANAGEMENT, 2024, 307
[4] Unsupervised Pansharpening Based on Self-Attention Mechanism
Qu, Ying
Baghbaderani, Razieh Kaviani
Qi, Hairong
Kwan, Chiman
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (04): : 3192 - 3208
[5] A self-attention TCN-based model for suicidal ideation detection from social media posts
Mirtaheri, Seyedeh Leili
Greco, Sergio
Shahbazian, Reza
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[6] Rethinking Self-Attention for Multispectral Object Detection
Hu, Sijie
Bonardi, Fabien
Bouchafa, Samia
Prendinger, Helmut
Sidibe, Desire
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 16300 - 16311
[7] Joint Selfattention-SVM DDoS Attack Detection and Defense Mechanism Based on Self-Attention Mechanism and SVM Classification for SDN Networks
Man, Wanying
Yang, Guiqin
Feng, Shurui
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2024, E107A (06) : 881 - 889
[8] Underwater image imbalance attenuation compensation based on attention and self-attention mechanism
Wang, Danxu
Wei, Yanhui
Liu, Junnan
Ouyang, Wenjia
Zhou, Xilin
2022 OCEANS HAMPTON ROADS, 2022,
[9] Mix-tower: Light visual question answering framework based on exclusive self-attention mechanism
Chen, Deguang
Chen, Jianrui
Yang, Luheng
Shang, Fanhua
NEUROCOMPUTING, 2024, 587
[10] Pedestrian Attribute Recognition Based on Dual Self-attention Mechanism
Fan, Zhongkui
Guan, Ye-peng
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (02) : 793 - 812

← 1 2 3 4 5 →