Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

被引：0

作者：

Hossam Boulal ^{[1
]}

Farida Bouroumane ^{[1
]}

Mohamed Hamidi ^{[2
]}

Jamal Barkani ^{[1
]}

Mustapha Abarkan ^{[1
]}

机构：

[1] FP Taza,LSI Laboratory

[2] USMBA University,Team of Modeling and Scientific Computing

[3] FPN,undefined

[4] UMP,undefined

来源：

International Journal of Speech Technology | 2025年 / 28卷 / 1期

关键词：

Speech recognition; Data augmentation; Deep learning; Feature extraction; Amazigh digits;

D O I：

10.1007/s10772-024-10164-y

中图分类号：

学科分类号：

摘要：

In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.

引用

页码：53 / 65

页数：12

共 50 条

[21] Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method
Boulal H.
Hamidi M.
Abarkan M.
Barkani J.
International Journal of Speech Technology, 2024, 27 (01) : 287 - 296
[22] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
Yi, Lu
Mak, Man-Wai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
[23] Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition
Gracelli, Ricardo
Almeida, Jurandy
2024 IEEE 37TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS 2024, 2024, : 243 - 248
[24] Adversarial Data Augmentation for Disordered Speech Recognition
Jin, Zengrui
Geng, Mengzhe
Xie, Xurong
Yu, Jianwei
Liu, Shansong
Liu, Xunying
Meng, Helen
INTERSPEECH 2021, 2021, : 4803 - 4807
[25] Facial Expression Recognition using Convolutional Neural Network with Data Augmentation
Ahmed, Tawsin Uddin
Hossain, Sazzad
Hossain, Mohammad Shahadat
Ul Islam, Raihan
Andersson, Karl
2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 336 - 341
[26] Hypo and Hyperarticulated Speech Data Augmentation for Spontaneous Speech Recognition
Lee, Sung Joo
Kang, Byung-Ok
Chung, Hoon
Park, Jeon Gue
Lee, Yun Keun
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2080 - 2084
[27] Underwater target recognition using convolutional recurrent neural networks with 3-D Mel-spectrogram and data augmentation
Liu, Feng
Shen, Tongsheng
Luo, Zailei
Zhao, Dexin
Guo, Shaojun
APPLIED ACOUSTICS, 2021, 178
[28] On the Effectiveness of Neural Text Generation Based Data Augmentation for Recognition of Morphologically Rich Speech
Tarjan, Balazs
Szaszak, Gyorgy
Fegyo, Tibor
Mihajlik, Peter
TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 437 - 445
[29] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
Mountzouris, Konstantinos
Perikos, Isidoros
Hatzilygeroudis, Ioannis
Corchado, Juan M.
Iglesias, Carlos A.
Kim, Byung-Gyu
Mehmood, Rashid
Ren, Fuji
Lee, In
ELECTRONICS, 2023, 12 (20)
[30] Convolutional Maxout Neural Networks for Low-Resource Speech Recognition
Cai, Meng
Shi, Yongzhe
Kang, Jian
Liu, Jia
Su, Tengrong
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 133 - +

← 1 2 3 4 5 →