Handling data scarcity through data augmentation for detecting offensive speech

被引:0
|
作者
Sekkate, Sara [1 ]
Chebbi, Safa [2 ]
Adib, Abdellah [1 ]
Ben Jebara, Sofia [2 ]
机构
[1] Hassan II Univ Casablanca, Fac Sci & Technol, LIM Lab, Mohammadia, Morocco
[2] Univ Carthage, Higher Sch Commun, COSIM Lab, Tunis 2088, Tunisia
关键词
Offensive speech; MFCC; SWT; Feature selection; Deep learning; Data augmentation; NEURAL-NETWORKS; HATE SPEECH; IDENTIFICATION;
D O I
10.1007/s12243-025-01072-6
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Generating synthetic dysarthric speech to overcome dysarthria acoustic data scarcity
    Andrew Hu
    Dhruv Phadnis
    Seyed Reza Shahamiri
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 6751 - 6768
  • [42] Data Augmentation for Bayesian Deep Learning
    Wang, Yuexi
    Polson, Nicholas
    Sokolov, Vadim O.
    BAYESIAN ANALYSIS, 2023, 18 (04): : 1041 - 1069
  • [43] Data augmentation in voice spoofing problem
    Choi, Hyo-Jung
    Kwak, Il-Youp
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 449 - 460
  • [44] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
    Baek, Ji-Young
    Lee, Seok-Pil
    Tsihrintzis, George A.
    ELECTRONICS, 2023, 12 (18)
  • [45] Offensive-Language Detection on Multi-Semantic Fusion Based on Data Augmentation
    Liu, Junjie
    Yang, Yong
    Fan, Xiaochao
    Ren, Ge
    Yang, Liang
    Ning, Qian
    APPLIED SYSTEM INNOVATION, 2022, 5 (01)
  • [46] Improving classification accuracy using data augmentation on small data sets
    Moreno-Barea, Francisco J.
    Jerez, Jose M.
    Franco, Leonardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161 (161)
  • [47] Balancing Data through Data Augmentation Improves the Generality of Transfer Learning for Diabetic Retinopathy Classification
    Mungloo-Dilmohamud, Zahra
    Khan, Maleika Heenaye-Mamode
    Jhumka, Khadiime
    Beedassy, Balkrish N.
    Mungloo, Noorshad Z.
    Pena-Reyes, Carlos
    APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [48] Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets
    Oriola, Oluwafemi
    Kotze, Eduan
    IEEE ACCESS, 2020, 8 (08): : 21496 - 21509
  • [49] Data Augmentation Techniques for Speech Emotion Recognition and Deep Learning
    Antonio Nicolas, Jose
    de Lope, Javier
    Grana, Manuel
    BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II, 2022, 13259 : 279 - 288
  • [50] Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
    Jin, Zengrui
    Geng, Mengzhe
    Deng, Jiajun
    Wang, Tianzi
    Hu, Shujie
    Li, Guinan
    Liu, Xunying
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 413 - 429