Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

被引:0
|
作者
Gupta, Astha [1 ]
Kumar, Rakesh [1 ]
Kumar, Yogesh [2 ]
机构
[1] Chandigarh Univ, Dept Comp Sci & Engn, Mohali, Punjab, India
[2] Indus Univ, Indus Inst Technol & Engn, Ahmadabad, Gujarat, India
关键词
Automatic Speech Recognition; Spectrogram; Short Term Fourier transform; MFCC; ResNet10; Inception V3; VGG16; DenseNet201; EfficientNetB0;
D O I
10.1007/s11042-023-16748-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a natural phenomenon and a significant mode of communication used by humans that is divided into two categories, human-to-human and human-to-machine. Human-to-human communication depends on the language the speaker uses. In contrast, human-to-machine communication is a technique in which machines recognize human speech and act accordingly, often termed Automatic Speech Recognition (ASR). Recognition of Non-Indian language is challenging due to pitch variations and other factors such as accent, pronunciation, etc. This paper proposes a novel approach based on Dense Net201 and EfficientNetB0, i.e., a hybrid model for the recognition of Speech. Initially, 76,263 speech samples are taken from 11 non-Indian languages, including Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Japanese, Russian, Spanish and Persian. When collected, these speech samples are pre-processed by removing noise. Then, Spectrogram, Short-Term Fourier Transform (STFT), Spectral Rolloff-Bandwidth, Mel-frequency Cepstral Coefficient (MFCC), and Chroma feature are used to extract features from the speech sample. Further, a comparative analysis of the proposed approach is shown with other Deep Learning (DL) models like ResNet10, Inception V3, VGG16, DenseNet201, and EfficientNetB0. Standard parameters like Precision, Recall, F1-Score, Confusion Matrix, Accuracy, and Loss curves are used to evaluate the performance of each model by considering speech samples from all the languages mentioned above. Thus, the experimental results show that the hybrid model stands out from all the other models by giving the highest recognition accuracy of 99.84% with a loss of 0.004%.
引用
收藏
页码:30145 / 30166
页数:22
相关论文
共 50 条
  • [31] Evaluating automatic speech recognition-based language learning systems: a case study
    van Doremalen, Joost
    Boves, Lou
    Colpaert, Jozef
    Cucchiarini, Catia
    Strik, Helmer
    COMPUTER ASSISTED LANGUAGE LEARNING, 2016, 29 (04) : 833 - 851
  • [32] Deep learning-based speech recognition for Korean elderly speech data including dementia patients
    Mun, Jeonghyeon
    Kang, Joonseo
    Kim, Kiwoong
    Bae, Jongbin
    Lee, Hyeonjun
    Lim, Changwon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2023, 36 (01) : 33 - 48
  • [33] Investigating Modulation Spectrogram Features for Deep Neural Network-based Automatic Speech Recognition
    Baby, Deepak
    Van Hamme, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2479 - 2483
  • [34] Non-Negative Matrix Factorization Based Compensation of Music for Automatic Speech Recognition
    Raj, Bhiksha
    Virtanen, Tuomas
    Chaudhuri, Sourish
    Singh, Rita
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 717 - +
  • [35] Automatic Speech Recognition Using Optimal Selection of Features Based On Hybrid ABC-PSO
    Endiratta, Sunanda M.
    Turk, Neelam
    Bansal, Dipali
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 2, 2016, : 449 - 455
  • [36] Suitability of syllable-based modeling units for end-to-end speech recognition in Sanskrit and other Indian languages
    Anoop, Chandran Savithri
    Ramakrishnan, Angarai Ganesan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 220
  • [37] Automatic Speech Recognition: Comparisons Between Convolutional Neural Networks, Hidden Markov Model and Hybrid Architecture
    Santos, Lyndaines
    Moreira, Nicolas de Araujo
    Sampaio, Robson
    Lima, Raizielle
    Oliveira, Francisco Carlos Mattos Brito
    EXPERT SYSTEMS, 2025, 42 (05)
  • [38] Speech Based Multiple Emotion Classification Model Using Deep Learning
    Patneedi, Shakti Swaroop
    Kumari, Nandini
    ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 648 - 659
  • [39] On the relevance of auditory-based Gabor features for deep learning in robust speech recognition
    Castro Martinez A.M.
    Mallidi S.H.
    Meyer B.T.
    Castro Martinez, Angel Mario (angel.castro@uni-oldenburg.de), 1600, Academic Press (45): : 21 - 38
  • [40] Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview
    Yu, Chongchong
    Kang, Meng
    Chen, Yunbing
    Wu, Jiajia
    Zhao, Xia
    IEEE ACCESS, 2020, 8 : 163829 - 163843