The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

被引：1

作者：

Meryam Telmem ^{[1
]}

Naouar Laaidi ^{[2
]}

Hassan Satori ^{[2
]}

机构：

[1] Université Moulay Ismail de Meknes,

[2] Sidi Mohamed Ben Abdellah University,undefined

来源：

International Journal of Speech Technology | 2025年 / 28卷 / 1期

关键词：

MFCC; Spectrogram; Mel-Spectrogram; CNN; LSTM; bi-LSTM; Amazigh language;

D O I：

10.1007/s10772-025-10183-3

中图分类号：

学科分类号：

摘要：

Feature extraction is an essential phase in the development of Automatic Speech Recognition (ASR) systems. This study examines the performance of different deep neural network architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and (bi-LSTM) models for the Amazigh speech recognition system. When applied a several of feature extraction techniques, specifically Mel-Frequency Cepstral Coefficients (MFCC), Spectrograms, and Mel-Spectrograms, on the performance of different. The results show that the Bi-LSTM with Spectrograms achieved a maximum accuracy of 85%, giving the best performance in our Amazigh Speech Recognition (ASR) study. and we show that each feature type offers specific advantages, influenced by the particular neural network architecture employed.

引用

页码：299 / 312

页数：13