Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引:25
|
作者
Rao, Sanjeev [1 ]
Verma, Anil Kumar [1 ]
Bhatia, Tarunpreet [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;
D O I
10.1016/j.eswa.2023.119594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] LATTE: LSTM Self-Attention based Anomaly Detection in Embedded Automotive Platforms
    Kukkala, Vipin Kumar
    Thiruloga, Sooryaa Vignesh
    Pasricha, Sudeep
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
  • [32] The hybrid framework of ensemble technique in machine learning for phishing detection
    Mahajan, Akanksha S.
    Navale, Pradnya K.
    Patil, Vaishnavi V.
    Khadse, Vijay M.
    Mahalle, Parikshit N.
    INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTER SECURITY, 2023, 21 (1-2) : 162 - 184
  • [33] A real-time framework for opinion spam detection in Arabic social networks
    Ezzat, Cherry A.
    Alkadri, Abdullah M.
    Elkorany, Abeer
    EGYPTIAN INFORMATICS JOURNAL, 2025, 29
  • [34] Multi-head self-attention mechanism-based global feature learning model for ASD diagnosis
    Zhao, Feng
    Feng, Fan
    Ye, Shixin
    Mao, Yanyan
    Chen, Xiaobo
    Li, Yuan
    Ning, Mao
    Zhang, Mingli
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 91
  • [35] Identification of the aging state of lithium-ion batteries via temporal convolution network and self-attention mechanism
    Ke, Leisi
    Fang, Linlin
    Meng, Jinhao
    Peng, Jichang
    Wu, Ji
    Lin, Mingqiang
    Stroe, Daniel-Ioan
    JOURNAL OF ENERGY STORAGE, 2024, 84
  • [36] Blood Pressure Estimation Using Self-Attention Mechanism Built-In ResUNet on PulseDB: Demographic Fairness and Generalization
    Jamil, Zainab
    Lui, Leong Ting
    Chan, Rosa H. M.
    IEEE SENSORS JOURNAL, 2025, 25 (01) : 1694 - 1705
  • [37] Self-Attention based fine-grained cross-media hybrid network
    Shan, Wei
    Huang, Dan
    Wang, Jiangtao
    Zou, Feng
    Li, Suwen
    PATTERN RECOGNITION, 2022, 130
  • [38] Hybrid Deep Learning Based Attack Detection for Imbalanced Data Classification
    Almarshdi, Rasha
    Nassef, Laila
    Fadel, Etimad
    Alowidi, Nahed
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (01) : 297 - 320
  • [39] Spam review detection using self attention based CNN and bi-directional LSTM
    Bhuvaneshwari, P.
    Rao, A. Nagaraja
    Robinson, Y. Harold
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (12) : 18107 - 18124
  • [40] Enhancement of spam detection mechanism based on hybrid -mean clustering and support vector machine
    Elssied, Nadir Omer Fadl
    Ibrahim, Othman
    Osman, Ahmed Hamza
    SOFT COMPUTING, 2015, 19 (11) : 3237 - 3248