Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引：25

作者：

Rao, Sanjeev ^{[1
]}

Verma, Anil Kumar ^{[1
]}

Bhatia, Tarunpreet ^{[1
]}

机构：

[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 217卷

关键词：

Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;

D O I：

10.1016/j.eswa.2023.119594

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.

引用

页数：21

共 50 条

[21] Human Activity Recognition Based on Self-Attention Mechanism in WiFi Environment
Ge, Fei
Yang, Zhimin
Dai, Zhenyang
Tan, Liansheng
Hu, Jianyuan
Li, Jiayuan
Qiu, Han
IEEE ACCESS, 2024, 12 : 85231 - 85243
[22] Understanding Multimodal Popularity Prediction of Social Media Videos With Self-Attention
Bielski, Adam
Trzcinski, Tomasz
IEEE ACCESS, 2018, 6 : 74277 - 74287
[23] Feature learning framework based on EEG graph self-attention networks for motor imagery BCI systems
Sun, Hao
Jin, Jing
Daly, Ian
Huang, Yitao
Zhao, Xueqing
Wang, Xingyu
Cichocki, Andrzej
JOURNAL OF NEUROSCIENCE METHODS, 2023, 399
[24] Spam Detection using KNN and Decision Tree Mechanism in Social Network
Goyal, Saumya
Chauhan, R. K.
Parveen, Shabnam
2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 522 - 526
[25] X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
Liu, Huey-Ing
Chen, Wei-Lin
APPLIED SCIENCES-BASEL, 2022, 12 (09):
[26] DTITD: An Intelligent Insider Threat Detection Framework Based on Digital Twin and Self-Attention Based Deep Learning Models
Wang, Zhi Qiang
El Saddik, Abdulmotaleb
IEEE ACCESS, 2023, 11 : 114013 - 114030
[27] An Ego Network Embedding Model via Neighbors Sampling and Self-attention Mechanism
Guo, Ziyu
Liu, Shijun
Pan, Li
He, Qiang
2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 425 - 432
[28] EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
Jung, Ilok
Ji, Jaewon
Cho, Changseob
ELECTRONICS, 2022, 11 (09)
[29] Deep Learning-Based Identification of Maize Leaf Diseases Is Improved by an Attention Mechanism: Self-Attention
Qian, Xiufeng
Zhang, Chengqi
Chen, Li
Li, Ke
FRONTIERS IN PLANT SCIENCE, 2022, 13
[30] Topic-aware neural attention network for malicious social media spam detection
Nasser, Maged
Saeed, Faisal
Da'u, Aminu
Alblwi, Abdulaziz
Al-Sarem, Mohammed
ALEXANDRIA ENGINEERING JOURNAL, 2025, 111 : 540 - 554

← 1 2 3 4 5 →