Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

被引：0

作者：

Hossam Boulal ^{[1
]}

Farida Bouroumane ^{[1
]}

Mohamed Hamidi ^{[2
]}

Jamal Barkani ^{[1
]}

Mustapha Abarkan ^{[1
]}

机构：

[1] FP Taza,LSI Laboratory

[2] USMBA University,Team of Modeling and Scientific Computing

[3] FPN,undefined

[4] UMP,undefined

来源：

International Journal of Speech Technology | 2025年 / 28卷 / 1期

关键词：

Speech recognition; Data augmentation; Deep learning; Feature extraction; Amazigh digits;

D O I：

10.1007/s10772-024-10164-y

中图分类号：

学科分类号：

摘要：

In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.

引用

页码：53 / 65

页数：12

共 50 条

[31] Data Augmentation Methods Applying Grayscale Images for Convolutional Neural Networks in Machine Vision
Wang, Jinyeong
Lee, Sanghwan
APPLIED SCIENCES-BASEL, 2021, 11 (15):
[32] Maxout neurons for deep convolutional and LSTM neural networks in speech recognition
Cai, Meng
Liu, Jia
SPEECH COMMUNICATION, 2016, 77 : 53 - 64
[33] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Niu, Yafeng
Zou, Dongsheng
Niu, Yadong
He, Zhongshi
Tan, Hua
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
[34] Analysis of Data Augmentation Techniques for Mobile Robots Localization by Means of Convolutional Neural Networks
Jose Cespedes, Orlando
Cebollada, Sergio
Jose Cabrera, Juan
Reinoso, Oscar
Paya, Luis
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT I, 2023, 675 : 503 - 514
[35] Speech Recognition Using Convolutional Neural Networks on Small Training Sets
Poliyev, A. V.
Korsun, O. N.
2019 WORKSHOP ON MATERIALS AND ENGINEERING IN AERONAUTICS, 2020, 714
[36] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
Rashid Jahangir
Ying Wah Teh
Ghulam Mujtaba
Roobaea Alroobaea
Zahid Hussain Shaikh
Ihsan Ali
Machine Vision and Applications, 2022, 33
[37] Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
Amjad, Ammar
Khan, Lal
Chang, Hsien-Tsung
PEERJ COMPUTER SCIENCE, 2022, 8
[38] Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
Salamon, Justin
Bello, Juan Pablo
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (03) : 279 - 283
[39] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
Jahangir, Rashid
Teh, Ying Wah
Mujtaba, Ghulam
Alroobaea, Roobaea
Shaikh, Zahid Hussain
Ali, Ihsan
MACHINE VISION AND APPLICATIONS, 2022, 33 (03)
[40] Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language
Kereni, Gil
Deng, Jun
Pohjalainen, Jouni
Schuller, Bjoern
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2393 - 2397

← 1 2 3 4 5 →