Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

被引:0
|
作者
Hossam Boulal [1 ]
Farida Bouroumane [1 ]
Mohamed Hamidi [2 ]
Jamal Barkani [1 ]
Mustapha Abarkan [1 ]
机构
[1] FP Taza,LSI Laboratory
[2] USMBA University,Team of Modeling and Scientific Computing
[3] FPN,undefined
[4] UMP,undefined
关键词
Speech recognition; Data augmentation; Deep learning; Feature extraction; Amazigh digits;
D O I
10.1007/s10772-024-10164-y
中图分类号
学科分类号
摘要
In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.
引用
收藏
页码:53 / 65
页数:12
相关论文
共 50 条
  • [31] Data Augmentation Methods Applying Grayscale Images for Convolutional Neural Networks in Machine Vision
    Wang, Jinyeong
    Lee, Sanghwan
    APPLIED SCIENCES-BASEL, 2021, 11 (15):
  • [32] Maxout neurons for deep convolutional and LSTM neural networks in speech recognition
    Cai, Meng
    Liu, Jia
    SPEECH COMMUNICATION, 2016, 77 : 53 - 64
  • [33] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Niu, Yafeng
    Zou, Dongsheng
    Niu, Yadong
    He, Zhongshi
    Tan, Hua
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
  • [34] Analysis of Data Augmentation Techniques for Mobile Robots Localization by Means of Convolutional Neural Networks
    Jose Cespedes, Orlando
    Cebollada, Sergio
    Jose Cabrera, Juan
    Reinoso, Oscar
    Paya, Luis
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT I, 2023, 675 : 503 - 514
  • [35] Speech Recognition Using Convolutional Neural Networks on Small Training Sets
    Poliyev, A. V.
    Korsun, O. N.
    2019 WORKSHOP ON MATERIALS AND ENGINEERING IN AERONAUTICS, 2020, 714
  • [36] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
    Rashid Jahangir
    Ying Wah Teh
    Ghulam Mujtaba
    Roobaea Alroobaea
    Zahid Hussain Shaikh
    Ihsan Ali
    Machine Vision and Applications, 2022, 33
  • [37] Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
    Amjad, Ammar
    Khan, Lal
    Chang, Hsien-Tsung
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [38] Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
    Salamon, Justin
    Bello, Juan Pablo
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (03) : 279 - 283
  • [39] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
    Jahangir, Rashid
    Teh, Ying Wah
    Mujtaba, Ghulam
    Alroobaea, Roobaea
    Shaikh, Zahid Hussain
    Ali, Ihsan
    MACHINE VISION AND APPLICATIONS, 2022, 33 (03)
  • [40] Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language
    Kereni, Gil
    Deng, Jun
    Pohjalainen, Jouni
    Schuller, Bjoern
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2393 - 2397