Hybrid ensemble framework with self-attention mechanism for social spam detection on imbalanced data

被引:25
|
作者
Rao, Sanjeev [1 ]
Verma, Anil Kumar [1 ]
Bhatia, Tarunpreet [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Data resampling; Machine Learning; Natural Language Processing; Online Social Network; Self; -Attention; Spam;
D O I
10.1016/j.eswa.2023.119594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybercriminals use social media platforms to disseminate spam, misleading facts, fake news, and malicious links. Blocking such deceptive social media spam is essential. However, extracting relevant features from social networks is challenging due to privacy and time constraints. Traditional frequency-based word representation techniques are time-consuming and inefficient in producing contextual word vectors. Word embeddings and deep learning models have recently shown good results in text classification. Also, most existing approaches assumed balanced class distribution, which is false for most real-world datasets. In this paper, an attempt is made to advance the performance of the social spam detection system by leveraging dataset balancing, advanced word embedding techniques, machine learning, and deep learning approaches with the self-attention mechanism. In the proposed framework, the datasets are balanced using NearMiss and SmoteTomek techniques to feed several machine-learning models. Later, the baseline ML models and proposed voting-based ensemble models are evaluated on imbalanced and balanced datasets. For the proposed deep learning-based hybrid approaches, embeddings are generated using GloVe and FastText word embeddings on the balanced combined dataset and passed into the deep neural network comprised of Conv1D and Bi-directional recurrent neural network layers with the self-attention mechanism for improved context understanding and effective results. This study examines hybrid approaches for detecting social spam using imbalanced social network data and picks the optimum combination. Besides, Machine learning ensembles, word embeddings, deep learning with hyper-parameter optimization, and a self-attention method are compared thoroughly. Experiments and comparisons with other techniques show that the proposed hybrid framework with deep learning-based approaches achieves better performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Human Activity Recognition Based on Self-Attention Mechanism in WiFi Environment
    Ge, Fei
    Yang, Zhimin
    Dai, Zhenyang
    Tan, Liansheng
    Hu, Jianyuan
    Li, Jiayuan
    Qiu, Han
    IEEE ACCESS, 2024, 12 : 85231 - 85243
  • [22] Understanding Multimodal Popularity Prediction of Social Media Videos With Self-Attention
    Bielski, Adam
    Trzcinski, Tomasz
    IEEE ACCESS, 2018, 6 : 74277 - 74287
  • [23] Feature learning framework based on EEG graph self-attention networks for motor imagery BCI systems
    Sun, Hao
    Jin, Jing
    Daly, Ian
    Huang, Yitao
    Zhao, Xueqing
    Wang, Xingyu
    Cichocki, Andrzej
    JOURNAL OF NEUROSCIENCE METHODS, 2023, 399
  • [24] Spam Detection using KNN and Decision Tree Mechanism in Social Network
    Goyal, Saumya
    Chauhan, R. K.
    Parveen, Shabnam
    2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 522 - 526
  • [25] X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
    Liu, Huey-Ing
    Chen, Wei-Lin
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [26] DTITD: An Intelligent Insider Threat Detection Framework Based on Digital Twin and Self-Attention Based Deep Learning Models
    Wang, Zhi Qiang
    El Saddik, Abdulmotaleb
    IEEE ACCESS, 2023, 11 : 114013 - 114030
  • [27] An Ego Network Embedding Model via Neighbors Sampling and Self-attention Mechanism
    Guo, Ziyu
    Liu, Shijun
    Pan, Li
    He, Qiang
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 425 - 432
  • [28] EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
    Jung, Ilok
    Ji, Jaewon
    Cho, Changseob
    ELECTRONICS, 2022, 11 (09)
  • [29] Deep Learning-Based Identification of Maize Leaf Diseases Is Improved by an Attention Mechanism: Self-Attention
    Qian, Xiufeng
    Zhang, Chengqi
    Chen, Li
    Li, Ke
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [30] Topic-aware neural attention network for malicious social media spam detection
    Nasser, Maged
    Saeed, Faisal
    Da'u, Aminu
    Alblwi, Abdulaziz
    Al-Sarem, Mohammed
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 111 : 540 - 554