Handling data scarcity through data augmentation for detecting offensive speech

被引：0

作者：

Sekkate, Sara ^{[1
]}

Chebbi, Safa ^{[2
]}

Adib, Abdellah ^{[1
]}

Ben Jebara, Sofia ^{[2
]}

机构：

[1] Hassan II Univ Casablanca, Fac Sci & Technol, LIM Lab, Mohammadia, Morocco

[2] Univ Carthage, Higher Sch Commun, COSIM Lab, Tunis 2088, Tunisia

来源：

ANNALS OF TELECOMMUNICATIONS | 2025年

关键词：

Offensive speech; MFCC; SWT; Feature selection; Deep learning; Data augmentation; NEURAL-NETWORKS; HATE SPEECH; IDENTIFICATION;

D O I：

10.1007/s12243-025-01072-6

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.

引用

页数：10

共 50 条

[31] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
Braunschweiler, Norbert
Doddipatla, Rama
Keizer, Simon
Stoyanchev, Svetlana
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
[32] Improving speech recognition using data augmentation and acoustic model fusion
Rebai, Ilyes
BenAyed, Yessine
Mahdi, Walid
Lorre, Jean-Pierre
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 316 - 322
[33] SIMULATING DYSARTHRIC SPEECH FOR TRAINING DATA AUGMENTATION IN CLINICAL SPEECH APPLICATIONS
Jiao, Yishan
Tu, Ming
Berisha, Visar
Liss, Julie
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6009 - 6013
[34] Investigation on Effect of Speech Imagery EEG Data Augmentation with Actual Speech
Choi, Jaehoon
Kaongoen, Netiwit
Jo, Sungho
10TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI2022), 2022,
[35] Improving Turkish Telephone Speech Recognition with Data Augmentation and Out of Domain Data
Uslu, Zeynep Gulhan
Yildirim, Tulay
2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 176 - 179
[36] MTDOT: A Multilingual Translation-Based Data Augmentation Technique for Offensive Content Identification in Tamil Text Data
Ganganwar, Vaishali
Rajalakshmi, Ratnavel
ELECTRONICS, 2022, 11 (21)
[37] Accurate synthesis of dysarthric Speech for ASR data augmentation
Soleymanpour, Mohammad
Johnson, Michael T.
Soleymanpour, Rahim
Berry, Jeffrey
SPEECH COMMUNICATION, 2024, 164
[38] Adversarial Data Augmentation Network for Speech Emotion Recognition
Yi, Lu
Mak, Man-Wai
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
[39] Investigation of Data Augmentation Techniques for Disordered Speech Recognition
Geng, Mengzhe
Xie, Xurong
Liu, Shansong
Yu, Jianwei
Hu, Shoukang
Liu, Xunying
Meng, Helen
INTERSPEECH 2020, 2020, : 696 - 700
[40] Data Augmentation Improves Recognition of Foreign Accented Speech
Fukuda, Takashi
Fernandez, Raul
Rosenberg, Andrew
Thomas, Samuel
Ramabhadran, Bhuvana
Sorin, Alexander
Kurata, Gakuto
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2409 - 2413

← 1 2 3 4 5 →