Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引:25
|
作者
Rao, Sanjeev [1 ]
Verma, Anil Kumar [1 ]
Bhatia, Tarunpreet [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;
D O I
10.1016/j.eswa.2023.119594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Hybrid ensemble approaches to online harassment detection in highly imbalanced data
    Tolba, Marwa
    Ouadfel, Salima
    Meshoul, Souham
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 175
  • [22] TextSpamDetector: textual content based deep learning framework for social spam detection using conjoint attention mechanism
    Elakkiya, E.
    Selvakumar, S.
    Leela Velusamy, R.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (10) : 9287 - 9302
  • [23] TextSpamDetector: textual content based deep learning framework for social spam detection using conjoint attention mechanism
    E. Elakkiya
    S. Selvakumar
    R. Leela Velusamy
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 9287 - 9302
  • [24] Anomaly Detection in QAR Data Using VAE-LSTM with Multihead Self-Attention Mechanism
    Rong, Chuitian
    OuYang, Shuxin
    Sun, Huabo
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [25] iSRD: Spam Review Detection with Imbalanced Data Distributions
    Al Najada, Hamzah
    Zhu, Xingquan
    2014 IEEE 15TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2014, : 553 - 560
  • [26] Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism
    Chen, Jinfu
    Wang, Weijia
    Liu, Bo
    Cai, Saihua
    Towey, Dave
    Wang, Shengran
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 171
  • [27] Dynamic Ensemble Framework for Imbalanced Data Classification
    Zhu, Tuanfei
    Hu, Xingchen
    Liu, Xinwang
    Zhu, En
    Zhu, Xinzhong
    Xu, Huiying
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2456 - 2471
  • [28] Combined self-attention mechanism for named entity recognition in social media
    Li M.
    Kong F.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2019, 59 (06): : 461 - 467
  • [29] Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets
    Xiao, Xi
    Xiao, Wentao
    Zhang, Dianyan
    Zhang, Bin
    Hu, Guangwu
    Li, Qing
    Xia, Shutao
    COMPUTERS & SECURITY, 2021, 108 (108)
  • [30] Conditional self-attention generative adversarial network with differential evolution algorithm for imbalanced data classification
    Niu, Jiawei
    Liu, Zhunga
    Pan, Quan
    Yang, Yanbo
    LI, Yang
    CHINESE JOURNAL OF AERONAUTICS, 2023, 36 (03) : 303 - 315