Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

被引：0

作者：

Hossam Boulal ^{[1
]}

Farida Bouroumane ^{[1
]}

Mohamed Hamidi ^{[2
]}

Jamal Barkani ^{[1
]}

Mustapha Abarkan ^{[1
]}

机构：

[1] FP Taza,LSI Laboratory

[2] USMBA University,Team of Modeling and Scientific Computing

[3] FPN,undefined

[4] UMP,undefined

来源：

International Journal of Speech Technology | 2025年 / 28卷 / 1期

关键词：

Speech recognition; Data augmentation; Deep learning; Feature extraction; Amazigh digits;

D O I：

10.1007/s10772-024-10164-y

中图分类号：

学科分类号：

摘要：

In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.

引用

页码：53 / 65

页数：12

共 50 条

[1] Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
Takahashi, Naoya
Gygli, Michael
Pfister, Beat
Van Goole, Luc
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2982 - 2986
[2] Data Augmentation for Drum Transcription with Convolutional Neural Networks
Jacques, Celine
Roebel, Axel
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[3] Human Activity Recognition Based on Multichannel Convolutional Neural Network With Data Augmentation
Shi, Wenbing
Fang, Xianjin
Yang, Gaoming
Huang, Ji
IEEE ACCESS, 2022, 10 : 76596 - 76606
[4] Convolutional Neural Network With Data Augmentation for SAR Target Recognition
Ding, Jun
Chen, Bo
Liu, Hongwei
Huang, Mengyuan
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (03) : 364 - 368
[5] Speech Recognition Based on Convolutional Neural Networks
Du Guiming
Wang Xia
Wang Guangyan
Zhang Yan
Li Dan
2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
[6] Neuromorphic Speech Recognition With Photonic Convolutional Spiking Neural Networks
Xiang, Shuiying
Zhang, Tianrui
Han, Yanan
Guo, Xingxing
Zhang, Yahui
Shi, Yuechun
Hao, Yue
IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2023, 29 (06)
[7] Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks
Aditi, Thakur
Karun, Verma
ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, VOL 1, 2019, 759 : 61 - 69
[8] Data Augmentation for EEG-Based Emotion Recognition with Deep Convolutional Neural Networks
Wang, Fang
Zhong, Sheng-hua
Peng, Jianfeng
Jiang, Jianmin
Liu, Yan
MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 82 - 93
[9] Deep Convolutional Neural Networks Based on Image Data Augmentation for Visual Object Recognition
Jayech, Khaoula
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 476 - 485
[10] DATA AUGMENTATION WITH GABOR FILTER IN DEEP CONVOLUTIONAL NEURAL NETWORKS FOR SAR TARGET RECOGNITION
Jiang, Ting
Cui, Zongyong
Zhou, Zhi
Cao, Zongjie
IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 689 - 692

← 1 2 3 4 5 →