Handling data scarcity through data augmentation for detecting offensive speech

被引：0

作者：

Sekkate, Sara ^{[1
]}

Chebbi, Safa ^{[2
]}

Adib, Abdellah ^{[1
]}

Ben Jebara, Sofia ^{[2
]}

机构：

[1] Hassan II Univ Casablanca, Fac Sci & Technol, LIM Lab, Mohammadia, Morocco

[2] Univ Carthage, Higher Sch Commun, COSIM Lab, Tunis 2088, Tunisia

来源：

ANNALS OF TELECOMMUNICATIONS | 2025年

关键词：

Offensive speech; MFCC; SWT; Feature selection; Deep learning; Data augmentation; NEURAL-NETWORKS; HATE SPEECH; IDENTIFICATION;

D O I：

10.1007/s12243-025-01072-6

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.

引用

页数：10

共 50 条

[21] Hypo and Hyperarticulated Speech Data Augmentation for Spontaneous Speech Recognition
Lee, Sung Joo
Kang, Byung-Ok
Chung, Hoon
Park, Jeon Gue
Lee, Yun Keun
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2080 - 2084
[22] Adversarial Data Augmentation for Disordered Speech Recognition
Jin, Zengrui
Geng, Mengzhe
Xie, Xurong
Yu, Jianwei
Liu, Shansong
Liu, Xunying
Meng, Helen
INTERSPEECH 2021, 2021, : 4803 - 4807
[23] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
Vachhani, Bhavik
Bhat, Chitralekha
Kopparapu, Sunil Kumar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
[24] Impact of Data Augmentation on Hate Speech Detection
Batarfi, Hanan A.
Alsaedi, Olaa A.
Wali, Arwa M.
Jamal, Amani T.
INNOVATIONS FOR COMMUNITY SERVICES, I4CS 2023, 2023, 1876 : 187 - 199
[25] Data Augmentation and Evolutionary Algorithms to Improve the Prediction of Blood Glucose Levels In Scarcity of Training Data
Manuel Velasco, Jose
Garnica, Oscar
Contador, Sergio
Lanchares, Juan
Maqueda, Esther
Botella, Marta
Ignacio Hidalgo, J.
2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 2193 - 2200
[26] Enhancing Romanian Offensive Language Detection Through Knowledge Distillation, Multi-task Learning, and Data Augmentation
Matei, Vlad-Cristian
Taiatu, Iulian-Marius
Smadu, Razvan-Alexandru
Cercel, Dumitru-Clementin
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 317 - 332
[27] Detecting Cancerous Cells Using Data Augmentation In Deep Cascaded Networks
Jain, Akshay
Chaturvedi, Pallavi
Gupta, Lalita
MACHINES, MECHANISM AND ROBOTICS, INACOMM 2019, 2022, : 1605 - 1613
[28] Improving Children's Speech Recognition through Out-of-Domain Data Augmentation
Fainberg, Joachim
Bell, Peter
Lincoln, Mike
Renals, Steve
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1598 - 1602
[29] Addressing data scarcity using audio signal augmentation and deep learning for bolt looseness prediction
Chelimilla, Nikesh
Chinthapenta, Viswanath
Korla, Srikanth
SMART MATERIALS AND STRUCTURES, 2024, 33 (08)
[30] Exploring data augmentation for Amazigh speech recognition with convolutional neural networks
Hossam Boulal
Farida Bouroumane
Mohamed Hamidi
Jamal Barkani
Mustapha Abarkan
International Journal of Speech Technology, 2025, 28 (1) : 53 - 65

← 1 2 3 4 5 →