Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引:25
|
作者
Rao, Sanjeev [1 ]
Verma, Anil Kumar [1 ]
Bhatia, Tarunpreet [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;
D O I
10.1016/j.eswa.2023.119594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Ensemble based spam detection in social loT using probabilistic data structures
    Singh, Amritpal
    Batra, Shalini
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 81 : 359 - 371
  • [32] A hybrid self-attention deep learning framework for multivariate sleep stage classification
    Yuan, Ye
    Jia, Kebin
    Ma, Fenglong
    Xun, Guangxu
    Wang, Yaqing
    Su, Lu
    Zhang, Aidong
    BMC BIOINFORMATICS, 2019, 20 (Suppl 16)
  • [33] Epilepsy detection based on multi-head self-attention mechanism
    Ru, Yandong
    An, Gaoyang
    Wei, Zheng
    Chen, Hongming
    PLOS ONE, 2024, 19 (06):
  • [34] Adverse drug reaction detection via a multihop self-attention mechanism
    Zhang, Tongxuan
    Lin, Hongfei
    Ren, Yuqi
    Yang, Liang
    Xu, Bo
    Yang, Zhihao
    Wang, Jian
    Zhang, Yijia
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [35] Conditional self-attention generative adversarial network with differential evolution algorithm for imbalanced data classification
    Jiawei NIU
    Zhunga LIU
    Quan PAN
    Yanbo YANG
    Yang LI
    Chinese Journal of Aeronautics, 2023, 36 (03) : 303 - 315
  • [36] Conditional self-attention generative adversarial network with differential evolution algorithm for imbalanced data classification
    Jiawei NIU
    Zhunga LIU
    Quan PAN
    Yanbo YANG
    Yang LI
    Chinese Journal of Aeronautics , 2023, (03) : 303 - 315
  • [37] A hybrid self-attention deep learning framework for multivariate sleep stage classification
    Ye Yuan
    Kebin Jia
    Fenglong Ma
    Guangxu Xun
    Yaqing Wang
    Lu Su
    Aidong Zhang
    BMC Bioinformatics, 20
  • [38] Object Detection Algorithm Based on Context Information and Self-Attention Mechanism
    Liang, Hong
    Zhou, Hui
    Zhang, Qian
    Wu, Ting
    SYMMETRY-BASEL, 2022, 14 (05):
  • [39] Adverse drug reaction detection via a multihop self-attention mechanism
    Tongxuan Zhang
    Hongfei Lin
    Yuqi Ren
    Liang Yang
    Bo Xu
    Zhihao Yang
    Jian Wang
    Yijia Zhang
    BMC Bioinformatics, 20
  • [40] EPILEPTIC SPIKE DETECTION BY RECURRENT NEURAL NETWORKS WITH SELF-ATTENTION MECHANISM
    Fukumori, Kosuke
    Yoshida, Noboru
    Sugano, Hidenori
    Nakajima, Madoka
    Tanaka, Toshihisa
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1406 - 1410