Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引:25
|
作者
Rao, Sanjeev [1 ]
Verma, Anil Kumar [1 ]
Bhatia, Tarunpreet [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;
D O I
10.1016/j.eswa.2023.119594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Magnetotelluric Data Inversion Based on Deep Learning With the Self-Attention Mechanism
    Xu, Kaijun
    Liang, Shuyuan
    Lu, Yan
    Hu, Zuzhi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [42] Progressive Hybrid Classifier Ensemble for Imbalanced Data
    Yang, Kaixiang
    Yu, Zhiwen
    Chen, C. L. Philip
    Cao, Wenming
    Wong, Hau-San
    You, Jane
    Han, Guoqiang
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (04): : 2464 - 2478
  • [43] A self-attention network for smoke detection
    Jiang, Minghua
    Zhao, Yaxin
    Yu, Feng
    Zhou, Changlong
    Peng, Tao
    FIRE SAFETY JOURNAL, 2022, 129
  • [44] EFFECT OF SELF-ATTENTION ON HEARTBEAT DETECTION
    HODAPP, V
    JOURNAL OF PSYCHOPHYSIOLOGY, 1995, 9 (03) : 280 - 280
  • [45] TFHSVul: A Fine-Grained Hybrid Semantic Vulnerability Detection Method Based on Self-Attention Mechanism in IoT
    Xu, Lijuan
    An, Baolong
    Li, Xin
    Zhao, Dawei
    Peng, Haipeng
    Song, Weizhao
    Tong, Fenghua
    Han, Xiaohui
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (01): : 30 - 44
  • [46] A novel hybrid neural network approach incorporating convolution and LSTM with a self-attention mechanism for web attack detection
    Luo, Kangqiang
    Chen, Yindong
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [47] A rolling bearing fault diagnosis method for imbalanced data based on multi-scale self-attention mechanism and novel loss function
    Qiang Ruiru
    Zhao Xiaoqiang
    INSIGHT, 2024, 66 (11) : 690 - 701
  • [48] Self-Attention based Automated Vulnerability Detection with Effective Data Representation
    Wu, Tongshuai
    Chen, Liwei
    Du, Gewangzi
    Zhu, Chenguang
    Shi, Gang
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 892 - 899
  • [49] Deep Hierarchical Ensemble Model for Suicide Detection on Imbalanced Social Media Data
    Li, Zepeng
    Zhou, Jiawei
    An, Zhengyi
    Cheng, Wenchuan
    Hu, Bin
    ENTROPY, 2022, 24 (04)
  • [50] Automatic Generation of News Commentary on Social Media Based on Self-Attention Mechanism
    Qu, Miao
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022