Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引:25
|
作者
Rao, Sanjeev [1 ]
Verma, Anil Kumar [1 ]
Bhatia, Tarunpreet [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;
D O I
10.1016/j.eswa.2023.119594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data
    Zhao, Chensu
    Xin, Yang
    Li, Xuefeng
    Yang, Yixian
    Chen, Yuling
    APPLIED SCIENCES-BASEL, 2020, 10 (03):
  • [2] Depression Detection Based on Hybrid Deep Learning SSCL Framework Using Self-Attention Mechanism: An Application to Social Networking Data
    Nadeem, Aleena
    Naveed, Muhammad
    Satti, Muhammad Islam
    Afzal, Hammad
    Ahmad, Tanveer
    Kim, Ki-Il
    SENSORS, 2022, 22 (24)
  • [3] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
    Omar, Karima
    Sakr, Rasha H.
    Alrahmawy, Mohammed F.
    Neural Computing and Applications, 2024, 36 (06) : 2749 - 2765
  • [4] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
    Omar, Karima
    Sakr, Rasha H.
    Alrahmawy, Mohammed F.
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (06): : 2749 - 2765
  • [5] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
    Karima Omar
    Rasha H. Sakr
    Mohammed F. Alrahmawy
    Neural Computing and Applications, 2024, 36 : 2749 - 2765
  • [6] Malware Classification on Imbalanced Data through Self-Attention
    Ding, Yu
    Wang, ShuPeng
    Xing, Jian
    Zhang, XiaoYu
    Qi, ZiSen
    Fu, Ge
    Qiang, Qian
    Sun, HaoLiang
    Zhang, JianYu
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 154 - 161
  • [7] Social Network Spam Detection Based on ALBERT and Combination of Bi-LSTM with Self-Attention
    Xu, Guangxia
    Zhou, Daiqi
    Liu, Jun
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [8] SatCoBiLSTM: Self-attention based hybrid deep learning framework for crisis event detection in social media
    Upadhyay, Abhishek
    Meena, Yogesh Kumar
    Chauhan, Ganpat Singh
    Expert Systems with Applications, 2024, 249
  • [9] SatCoBiLSTM: Self-attention based hybrid deep learning framework for crisis event detection in social media
    Upadhyay, Abhishek
    Meena, Yogesh Kumar
    Chauhan, Ganpat Singh
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [10] A hybrid spam detection framework for social networks
    Citlak, Oguzhan
    Dorterler, Murat
    Dogru, Ibrahim Alper
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (02): : 823 - 837