The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

被引:1
作者
Meryam Telmem [1 ]
Naouar Laaidi [2 ]
Hassan Satori [2 ]
机构
[1] Université Moulay Ismail de Meknes,
[2] Sidi Mohamed Ben Abdellah University,undefined
关键词
MFCC; Spectrogram; Mel-Spectrogram; CNN; LSTM; bi-LSTM; Amazigh language;
D O I
10.1007/s10772-025-10183-3
中图分类号
学科分类号
摘要
Feature extraction is an essential phase in the development of Automatic Speech Recognition (ASR) systems. This study examines the performance of different deep neural network architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and (bi-LSTM) models for the Amazigh speech recognition system. When applied a several of feature extraction techniques, specifically Mel-Frequency Cepstral Coefficients (MFCC), Spectrograms, and Mel-Spectrograms, on the performance of different. The results show that the Bi-LSTM with Spectrograms achieved a maximum accuracy of 85%, giving the best performance in our Amazigh Speech Recognition (ASR) study. and we show that each feature type offers specific advantages, influenced by the particular neural network architecture employed.
引用
收藏
页码:299 / 312
页数:13
相关论文
empty
未找到相关数据