Taylor-DBN: A new framework for speech recognition systems

被引:0
|
作者
Haridas, Arul Valiyavalappil [1 ]
Marimuthu, Ramalatha [2 ]
Sivakumar, V. G. [3 ]
Chakraborty, Basabi [4 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India
[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India
[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India
[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan
关键词
Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;
D O I
10.1142/S021969132050071X
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] An asynchronous DBN for audio-visual speech recognition
    Saenko, Kate
    Livescu, Karen
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +
  • [2] An Optimization of DBN/GPU Speech Recognition on Wireless Network Applications
    Jing, Weipeng
    Jiang, Tao
    Liu, Yaqiu
    WIRELESS INTERNET (WICON 2016), 2018, 214 : 189 - 196
  • [3] LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION WITH CONTEXT-DEPENDENT DBN-HMMS
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4688 - 4691
  • [4] Automatic speech recognition systems
    Catariov, A
    Information Technologies 2004, 2004, 5822 : 83 - 93
  • [5] Single stream DBN model for continuous speech recognition and phone segmentation
    Lv, Guoyun
    Jiang, Dongmei
    Guo, Pengjuan
    Sun, Ali
    Zhao, Rongchun
    Sahli, H.
    Verhelst, W.
    DCABES 2006 Proceedings, Vols 1 and 2, 2006, : 277 - 280
  • [6] A framework for secure speech recognition
    Smaragdis, Paris
    Shashanka, Madhusudana V. S.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 969 - +
  • [7] A Framework for Speech Recognition Benchmarking
    Dernoncourt, Franck
    Trung Bui
    Chang, Walter
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 169 - 170
  • [8] A framework for secure speech recognition
    Smaragdis, Paris
    Shashanka, Madhusudana
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1404 - 1413
  • [9] SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems
    Chen, Huili
    Darvish, Bita
    Koushanfar, Farinaz
    INTERSPEECH 2020, 2020, : 2312 - 2316
  • [10] A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
    Lin, Yi
    Guo, Dongyue
    Zhang, Jianwei
    Chen, Zhengmao
    Yang, Bo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3608 - 3620