Taylor-DBN: A new framework for speech recognition systems

被引:0
|
作者
Haridas, Arul Valiyavalappil [1 ]
Marimuthu, Ramalatha [2 ]
Sivakumar, V. G. [3 ]
Chakraborty, Basabi [4 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India
[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India
[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India
[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan
关键词
Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;
D O I
10.1142/S021969132050071X
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Estimation of Speech Intelligibility Using Speech Recognition Systems
    Takano, Yusuke
    Kondo, Kazuhiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (12): : 3368 - 3376
  • [22] Support software for Automatic Speech Recognition systems targeted for non-native speech
    Radzikowski, Kacper
    Yoshie, Osamu
    Nowak, Robert
    22ND INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2020), 2020, : 55 - 61
  • [23] Speech recognition using Taylor-gradient Descent political optimization based Deep residual network
    Arul, V. H.
    Marimuthu, Ramalatha
    COMPUTER SPEECH AND LANGUAGE, 2023, 78
  • [24] USING A* FOR THE PARALLELIZATION OF SPEECH RECOGNITION SYSTEMS
    Cardinal, Patrick
    Boulianne, Gilles
    Dumouchel, Pierre
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4433 - 4436
  • [25] EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
    Gerczuk, Maurice
    Amiriparian, Shahin
    Ottl, Sandra
    Schuller, Bjorn W. W.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1472 - 1487
  • [26] Robust Speech Recognition using Generalized Distillation Framework
    Markov, Konstantin
    Matsui, Tomoko
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2364 - 2368
  • [27] Exploration of an Independent Training Framework for Speech Emotion Recognition
    Zhong, Shunming
    Yu, Baoxian
    Zhang, Han
    IEEE ACCESS, 2020, 8 : 222533 - 222543
  • [28] Lightweight Deep Learning Framework for Speech Emotion Recognition
    Akinpelu, Samson
    Viriri, Serestina
    Adegun, Adekanmi
    IEEE ACCESS, 2023, 11 : 77086 - 77098
  • [29] Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems
    Aida-Zade, K. R.
    Ardil, C.
    Rustamov, S. S.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 13, 2006, 13 : 275 - +
  • [30] A survey on automatic speech recognition systems for Portuguese language and its variations
    de Lima, Thales Aguiar
    Da Costa-Abreu, Marjory
    COMPUTER SPEECH AND LANGUAGE, 2020, 62 (62)