Handling data scarcity through data augmentation for detecting offensive speech

被引:0
|
作者
Sekkate, Sara [1 ]
Chebbi, Safa [2 ]
Adib, Abdellah [1 ]
Ben Jebara, Sofia [2 ]
机构
[1] Hassan II Univ Casablanca, Fac Sci & Technol, LIM Lab, Mohammadia, Morocco
[2] Univ Carthage, Higher Sch Commun, COSIM Lab, Tunis 2088, Tunisia
关键词
Offensive speech; MFCC; SWT; Feature selection; Deep learning; Data augmentation; NEURAL-NETWORKS; HATE SPEECH; IDENTIFICATION;
D O I
10.1007/s12243-025-01072-6
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Handling Data Scarcity Through Data Augmentation in Training of Deep Neural Networks for 3D Data Processing
    Srivastava, Akhilesh Mohan
    Rotte, Priyanka Ajay
    Jain, Arushi
    Prakash, Surya
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2022, 18 (01)
  • [2] Data augmentation for speech separation
    Alex, Ashish
    Wang, Lin
    Gastaldo, Paolo
    Cavallaro, Andrea
    SPEECH COMMUNICATION, 2023, 152
  • [3] Generating synthetic dysarthric speech to overcome dysarthria acoustic data scarcity
    Hu, Andrew
    Phadnis, Dhruv
    Shahamiri, Seyed Reza
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (6) : 6751 - 6768
  • [4] Handling imbalanced textual data: an attention-based data augmentation approach
    Sah, Amit Kumar
    Abulaish, Muhammad
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [5] DETECTING ALZHEIMER'S DISEASE FROM SPEECH USING NEURAL NETWORKS WITH BOTTLENECK FEATURES AND DATA AUGMENTATION
    Liu, Zhaoci
    Guo, Zhiqiang
    Ling, Zhenhua
    Li, Yunxia
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7323 - 7327
  • [6] CGAN Facilitated Data Augmentation of Voice and Speech Parameters for Detecting Parkinson's Disease in the Prodromal Phase
    Chandrabhanu, Sandhya
    Hemalatha, Shanmugam
    BRAIN-BROAD RESEARCH IN ARTIFICIAL INTELLIGENCE AND NEUROSCIENCE, 2024, 15 (03) : 208 - 222
  • [7] Application of Data Augmentation Techniques for Hate Speech Detection with Deep Learning
    Venturott, Ligia Iunes
    Ciarelli, Patrick Marques
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 778 - 787
  • [8] Handling emotional speech: a prosody based data augmentation technique for improving neutral speech trained ASR systems
    Kammili, Pavan Raju
    Raju, B. H. V. S. Ramakrishnam
    Krishna, A. Sri
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 197 - 204
  • [9] Handling emotional speech: a prosody based data augmentation technique for improving neutral speech trained ASR systems
    Pavan Raju Kammili
    B. H. V. S. Ramakrishnam Raju
    A. Sri Krishna
    International Journal of Speech Technology, 2022, 25 : 197 - 204
  • [10] Data Augmentation for Improving Explainability of Hate Speech Detection
    Ansari, Gunjan
    Kaur, Parmeet
    Saxena, Chandni
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (03) : 3609 - 3621