Taylor-DBN: A new framework for speech recognition systems

被引:0
|
作者
Haridas, Arul Valiyavalappil [1 ]
Marimuthu, Ramalatha [2 ]
Sivakumar, V. G. [3 ]
Chakraborty, Basabi [4 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India
[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India
[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India
[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan
关键词
Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;
D O I
10.1142/S021969132050071X
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] A MODEL STRUCTURE INTEGRATION BASED ON A BAYESIAN FRAMEWORK FOR SPEECH RECOGNITION
    Shiota, Sayaka
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4813 - 4816
  • [42] Continuous Automatic Speech Recognition System using MapReduce Framework
    Vikram, M.
    Reddy, N. Sudhakar
    Madhavi, K.
    2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 80 - 83
  • [43] Associative memory framework for speech recognition: Adaptation of Hopfield network
    Vaishnavi, Y.
    Shreyas, R.
    Suhas, S.
    Surya, U. N.
    Ladwani, Vandana M.
    Ramasubramanian, V.
    2016 IEEE ANNUAL INDIA CONFERENCE (INDICON), 2016,
  • [44] A Bayesian Framework Using Multiple Model Structures for Speech Recognition
    Shiota, Sayaka
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04): : 939 - 948
  • [45] Framework for choosing a set of syllables and phonemes for Lithuanian speech recognition
    Laurinciukaite, Sigita
    Lipeika, Antanas
    INFORMATICA, 2007, 18 (03) : 395 - 406
  • [46] Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model
    Liu, Qi
    Chen, Zhehuai
    Li, Hao
    Huang, Mingkun
    Lu, Yizhou
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2174 - 2183
  • [47] A New Method for Model Selection in Speech Recognition
    Wu, Yahui
    Liu, Gang
    Guo, Jun
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 165 - 168
  • [48] A New Algorithm for Speech and Gender Recognition on the Basis of Voiced Parts of Speech
    Karwan, Jakub
    Saeed, Khalid
    COMPUTER INFORMATION SYSTEMS - ANALYSIS AND TECHNOLOGIES, 2011, 245 : 113 - 120
  • [49] A methodology for speech data analysis and a framework for adaptive speech recognition using fuzzy neural networks
    Kasabov, N
    Kozma, R
    Kilgour, R
    Laws, M
    Taylor, J
    Watts, M
    Gray, A
    PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1055 - 1060
  • [50] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
    Vich, Robert
    Nouza, Jan
    Vondra, Martin
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +