Handling data scarcity through data augmentation for detecting offensive speech

被引:0
|
作者
Sekkate, Sara [1 ]
Chebbi, Safa [2 ]
Adib, Abdellah [1 ]
Ben Jebara, Sofia [2 ]
机构
[1] Hassan II Univ Casablanca, Fac Sci & Technol, LIM Lab, Mohammadia, Morocco
[2] Univ Carthage, Higher Sch Commun, COSIM Lab, Tunis 2088, Tunisia
关键词
Offensive speech; MFCC; SWT; Feature selection; Deep learning; Data augmentation; NEURAL-NETWORKS; HATE SPEECH; IDENTIFICATION;
D O I
10.1007/s12243-025-01072-6
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [32] Improving speech recognition using data augmentation and acoustic model fusion
    Rebai, Ilyes
    BenAyed, Yessine
    Mahdi, Walid
    Lorre, Jean-Pierre
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 316 - 322
  • [33] SIMULATING DYSARTHRIC SPEECH FOR TRAINING DATA AUGMENTATION IN CLINICAL SPEECH APPLICATIONS
    Jiao, Yishan
    Tu, Ming
    Berisha, Visar
    Liss, Julie
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6009 - 6013
  • [34] Investigation on Effect of Speech Imagery EEG Data Augmentation with Actual Speech
    Choi, Jaehoon
    Kaongoen, Netiwit
    Jo, Sungho
    10TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI2022), 2022,
  • [35] Improving Turkish Telephone Speech Recognition with Data Augmentation and Out of Domain Data
    Uslu, Zeynep Gulhan
    Yildirim, Tulay
    2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 176 - 179
  • [36] MTDOT: A Multilingual Translation-Based Data Augmentation Technique for Offensive Content Identification in Tamil Text Data
    Ganganwar, Vaishali
    Rajalakshmi, Ratnavel
    ELECTRONICS, 2022, 11 (21)
  • [37] Accurate synthesis of dysarthric Speech for ASR data augmentation
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    SPEECH COMMUNICATION, 2024, 164
  • [38] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [39] Investigation of Data Augmentation Techniques for Disordered Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Liu, Shansong
    Yu, Jianwei
    Hu, Shoukang
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2020, 2020, : 696 - 700
  • [40] Data Augmentation Improves Recognition of Foreign Accented Speech
    Fukuda, Takashi
    Fernandez, Raul
    Rosenberg, Andrew
    Thomas, Samuel
    Ramabhadran, Bhuvana
    Sorin, Alexander
    Kurata, Gakuto
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2409 - 2413