Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

被引:0
|
作者
Hossam Boulal [1 ]
Farida Bouroumane [1 ]
Mohamed Hamidi [2 ]
Jamal Barkani [1 ]
Mustapha Abarkan [1 ]
机构
[1] FP Taza,LSI Laboratory
[2] USMBA University,Team of Modeling and Scientific Computing
[3] FPN,undefined
[4] UMP,undefined
关键词
Speech recognition; Data augmentation; Deep learning; Feature extraction; Amazigh digits;
D O I
10.1007/s10772-024-10164-y
中图分类号
学科分类号
摘要
In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.
引用
收藏
页码:53 / 65
页数:12
相关论文
共 50 条
  • [1] Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
    Takahashi, Naoya
    Gygli, Michael
    Pfister, Beat
    Van Goole, Luc
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2982 - 2986
  • [2] Data Augmentation for Drum Transcription with Convolutional Neural Networks
    Jacques, Celine
    Roebel, Axel
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [3] Human Activity Recognition Based on Multichannel Convolutional Neural Network With Data Augmentation
    Shi, Wenbing
    Fang, Xianjin
    Yang, Gaoming
    Huang, Ji
    IEEE ACCESS, 2022, 10 : 76596 - 76606
  • [4] Convolutional Neural Network With Data Augmentation for SAR Target Recognition
    Ding, Jun
    Chen, Bo
    Liu, Hongwei
    Huang, Mengyuan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (03) : 364 - 368
  • [5] Speech Recognition Based on Convolutional Neural Networks
    Du Guiming
    Wang Xia
    Wang Guangyan
    Zhang Yan
    Li Dan
    2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
  • [6] Neuromorphic Speech Recognition With Photonic Convolutional Spiking Neural Networks
    Xiang, Shuiying
    Zhang, Tianrui
    Han, Yanan
    Guo, Xingxing
    Zhang, Yahui
    Shi, Yuechun
    Hao, Yue
    IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2023, 29 (06)
  • [7] Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks
    Aditi, Thakur
    Karun, Verma
    ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, VOL 1, 2019, 759 : 61 - 69
  • [8] Data Augmentation for EEG-Based Emotion Recognition with Deep Convolutional Neural Networks
    Wang, Fang
    Zhong, Sheng-hua
    Peng, Jianfeng
    Jiang, Jianmin
    Liu, Yan
    MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 82 - 93
  • [9] Deep Convolutional Neural Networks Based on Image Data Augmentation for Visual Object Recognition
    Jayech, Khaoula
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 476 - 485
  • [10] DATA AUGMENTATION WITH GABOR FILTER IN DEEP CONVOLUTIONAL NEURAL NETWORKS FOR SAR TARGET RECOGNITION
    Jiang, Ting
    Cui, Zongyong
    Zhou, Zhi
    Cao, Zongjie
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 689 - 692