Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset

被引:0
作者
Telmem, Meryam [1 ]
Laaidi, Naouar [2 ]
Ghanou, Youssef [1 ]
Hamiane, Sanae [1 ]
Satori, Hassan [2 ]
机构
[1] Moulay Ismail University, Meknes
[2] Sidi Mohamed Ben Abdellah University, Fes
关键词
Amazigh language; CNN; LSTM; MFCC; RNN; Spectrogram;
D O I
10.1007/s10772-024-10154-0
中图分类号
学科分类号
摘要
The field of artificial intelligence has witnessed remarkable advancements in speech recognition technology. Among the forefront contenders in this domain are Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). However, when it comes to their efficacy in recognizing the Amazigh language, which network reigns supreme? This article presents a comparative study of Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), and a hybrid CNN-LSTM model in the context of speech recognition systems. The main objective of this work is to identify which network architecture delivers the best performance for recognizing the Amazigh language. Our research stands out as one of the first to develop and compare three distinct deep models specifically for the Amazigh language, effectively addressing the challenges posed by a low-resource language. Through a series of rigorous experiments and evaluations conducted using the Tifdigit dataset, the study’s results underscore the superiority of CNNs in Amazigh speech recognition with 88% of accuracy when the CNN trained with female category dataset. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:1121 / 1133
页数:12
相关论文
共 29 条
[1]  
Abdelmaksoud E.R., Hassen A., Hassan N., Hesham M., Convolutional neural network for Arabic speech recognition, The Egyptian Journal of Language Engineering, 8, 1, pp. 27-38, (2021)
[2]  
Albayati A.Q., Altaie S.A.J., Al-Obaydy W.N.I., Alkhalid F.F., Performance analysis of optimization algorithms for convolutional neural network-based handwritten digit recognition, IAES International Journal of Artificial Intelligence (IJ-AI), 13, 1, pp. 563-571, (2024)
[3]  
Ali A.R., Multi-dialect Arabic speech recognition, 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1-7, (2020)
[4]  
Alsayadi H.A., Abdelhamid A.A., Hegazy I., Alotaibi B., Fayed Z.T., Deep investigation of the recent advances in dialectal Arabic speech recognition, IEEE Access, 10, pp. 57063-57079, (2022)
[5]  
Alsayadi H.A., Abdelhamid A.A., Hegazy I., Fayed Z.T., Arabic speech recognition using end-to-end deep learning, IET Signal Processing, 15, 8, pp. 521-534, (2021)
[6]  
Alsayadi H.A., Abdelhamid A.A., Hegazy I., Fayed Z.T., Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models, Journal of Intelligent & Fuzzy Systems, 41, 6, pp. 6207-6621, (2021)
[7]  
Ameen Z.J.M., Kadhim A.A., Machine learning for Arabic phonemes recognition using electrolarynx speech, International Journal of Electrical and Computer Engineering, 13, 1, (2023)
[8]  
Ameur M., Bouhjar A., Boukhris F., Initiation la langue amazigh, (2004)
[9]  
Astuti Y., Hidayat R., Bejo A., A mel-weighted spectrogram feature extraction for improved speaker recognition system, International Journal of Intelligent Engineering & Systems, 15, 6, (2022)
[10]  
Badshah A.M., Ahmad J., Rahim N., Baik S.W., Speech emotion recognition from spectrograms with deep convolutional neural network, 2017 International Conference on Platform Technology and Service (Platcon), pp. 1-5, (2017)