Taylor-DBN: A new framework for speech recognition systems

被引:0
|
作者
Haridas, Arul Valiyavalappil [1 ]
Marimuthu, Ramalatha [2 ]
Sivakumar, V. G. [3 ]
Chakraborty, Basabi [4 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India
[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India
[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India
[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan
关键词
Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;
D O I
10.1142/S021969132050071X
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems
    Basak, Sneha
    Agrawal, Himanshi
    Jena, Shreya
    Gite, Shilpa
    Bachute, Mrinal
    Pradhan, Biswajeet
    Assiri, Andmazen
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (02): : 1053 - 1089
  • [32] A new noisy speech recognition method
    Zhao, XQ
    Wang, J
    International Symposium on Communications and Information Technologies 2005, Vols 1 and 2, Proceedings, 2005, : 282 - 286
  • [33] Towards mixed language speech recognition systems
    Imseng, David
    Bourlard, Herve
    Magimai-Doss, Mathew
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 278 - 281
  • [34] DEVELOPMENT OF NEW SPEECH CORPUS FOR ELDERLY JAPANESE SPEECH RECOGNITION
    Iribe, Yurie
    Kitaoka, Norihide
    Segawa, Shuhei
    2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 27 - 31
  • [35] Universal Adversarial Perturbations for Speech Recognition Systems
    Neekhara, Paarth
    Hussain, Shehzeen
    Pandey, Prakhar
    Dubnov, Shlomo
    McAuley, Julian
    Koushanfar, Farinaz
    INTERSPEECH 2019, 2019, : 481 - 485
  • [36] MODERN TRENDS IN THE DEVELOPMENT OF SPEECH RECOGNITION SYSTEMS
    Mamyrbayev, O.
    Oralbekova, D.
    NEWS OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN-SERIES PHYSICO-MATHEMATICAL, 2020, 4 (332): : 42 - 51
  • [37] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
  • [38] Phoneme fuzzy characterization in speech recognition systems
    Beritelli, F
    Borrometi, L
    Cuce, A
    APPLICATIONS OF SOFT COMPUTING, 1997, 3165 : 305 - 306
  • [39] A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition
    Hazmoune, Samira
    Bougamouza, Fateh
    Mazouzi, Smaine
    Benmohammed, Mohamed
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (03) : 689 - 704
  • [40] A hierarchical duration model for speech recognition based on the ANGIE framework
    Chung, GY
    Seneff, S
    SPEECH COMMUNICATION, 1999, 27 (02) : 113 - 134