Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引:25
|
作者
Rao, Sanjeev [1 ]
Verma, Anil Kumar [1 ]
Bhatia, Tarunpreet [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;
D O I
10.1016/j.eswa.2023.119594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] A hybrid spam detection framework for social networks
    Citlak, Oguzhan
    Dorterler, Murat
    Dogru, Ibrahim Alper
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (02): : 823 - 837
  • [2] Machine Learning Based Anomaly Detection of Log Files Using Ensemble Learning and Self-Attention
    Falt, Markus
    Forsstrom, Stefan
    Zhang, Tingting
    2021 5TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2021), 2021, : 209 - 215
  • [3] A novel ensemble system for short-term wind speed forecasting based on hybrid decomposition approach and artificial intelligence models optimized by self-attention mechanism
    Pang, Junheng
    Dong, Sheng
    ENERGY CONVERSION AND MANAGEMENT, 2024, 307
  • [4] Unsupervised Pansharpening Based on Self-Attention Mechanism
    Qu, Ying
    Baghbaderani, Razieh Kaviani
    Qi, Hairong
    Kwan, Chiman
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (04): : 3192 - 3208
  • [5] A self-attention TCN-based model for suicidal ideation detection from social media posts
    Mirtaheri, Seyedeh Leili
    Greco, Sergio
    Shahbazian, Reza
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [6] Rethinking Self-Attention for Multispectral Object Detection
    Hu, Sijie
    Bonardi, Fabien
    Bouchafa, Samia
    Prendinger, Helmut
    Sidibe, Desire
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 16300 - 16311
  • [7] Joint Selfattention-SVM DDoS Attack Detection and Defense Mechanism Based on Self-Attention Mechanism and SVM Classification for SDN Networks
    Man, Wanying
    Yang, Guiqin
    Feng, Shurui
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2024, E107A (06) : 881 - 889
  • [8] Underwater image imbalance attenuation compensation based on attention and self-attention mechanism
    Wang, Danxu
    Wei, Yanhui
    Liu, Junnan
    Ouyang, Wenjia
    Zhou, Xilin
    2022 OCEANS HAMPTON ROADS, 2022,
  • [9] Mix-tower: Light visual question answering framework based on exclusive self-attention mechanism
    Chen, Deguang
    Chen, Jianrui
    Yang, Luheng
    Shang, Fanhua
    NEUROCOMPUTING, 2024, 587
  • [10] Pedestrian Attribute Recognition Based on Dual Self-attention Mechanism
    Fan, Zhongkui
    Guan, Ye-peng
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (02) : 793 - 812