Handling data scarcity through data augmentation for detecting offensive speech

被引:0
|
作者
Sekkate, Sara [1 ]
Chebbi, Safa [2 ]
Adib, Abdellah [1 ]
Ben Jebara, Sofia [2 ]
机构
[1] Hassan II Univ Casablanca, Fac Sci & Technol, LIM Lab, Mohammadia, Morocco
[2] Univ Carthage, Higher Sch Commun, COSIM Lab, Tunis 2088, Tunisia
关键词
Offensive speech; MFCC; SWT; Feature selection; Deep learning; Data augmentation; NEURAL-NETWORKS; HATE SPEECH; IDENTIFICATION;
D O I
10.1007/s12243-025-01072-6
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Hypo and Hyperarticulated Speech Data Augmentation for Spontaneous Speech Recognition
    Lee, Sung Joo
    Kang, Byung-Ok
    Chung, Hoon
    Park, Jeon Gue
    Lee, Yun Keun
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2080 - 2084
  • [22] Adversarial Data Augmentation for Disordered Speech Recognition
    Jin, Zengrui
    Geng, Mengzhe
    Xie, Xurong
    Yu, Jianwei
    Liu, Shansong
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4803 - 4807
  • [23] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
  • [24] Impact of Data Augmentation on Hate Speech Detection
    Batarfi, Hanan A.
    Alsaedi, Olaa A.
    Wali, Arwa M.
    Jamal, Amani T.
    INNOVATIONS FOR COMMUNITY SERVICES, I4CS 2023, 2023, 1876 : 187 - 199
  • [25] Data Augmentation and Evolutionary Algorithms to Improve the Prediction of Blood Glucose Levels In Scarcity of Training Data
    Manuel Velasco, Jose
    Garnica, Oscar
    Contador, Sergio
    Lanchares, Juan
    Maqueda, Esther
    Botella, Marta
    Ignacio Hidalgo, J.
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 2193 - 2200
  • [26] Enhancing Romanian Offensive Language Detection Through Knowledge Distillation, Multi-task Learning, and Data Augmentation
    Matei, Vlad-Cristian
    Taiatu, Iulian-Marius
    Smadu, Razvan-Alexandru
    Cercel, Dumitru-Clementin
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 317 - 332
  • [27] Detecting Cancerous Cells Using Data Augmentation In Deep Cascaded Networks
    Jain, Akshay
    Chaturvedi, Pallavi
    Gupta, Lalita
    MACHINES, MECHANISM AND ROBOTICS, INACOMM 2019, 2022, : 1605 - 1613
  • [28] Improving Children's Speech Recognition through Out-of-Domain Data Augmentation
    Fainberg, Joachim
    Bell, Peter
    Lincoln, Mike
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1598 - 1602
  • [29] Addressing data scarcity using audio signal augmentation and deep learning for bolt looseness prediction
    Chelimilla, Nikesh
    Chinthapenta, Viswanath
    Korla, Srikanth
    SMART MATERIALS AND STRUCTURES, 2024, 33 (08)
  • [30] Exploring data augmentation for Amazigh speech recognition with convolutional neural networks
    Hossam Boulal
    Farida Bouroumane
    Mohamed Hamidi
    Jamal Barkani
    Mustapha Abarkan
    International Journal of Speech Technology, 2025, 28 (1) : 53 - 65