Taylor-DBN: A new framework for speech recognition systems

被引：0

作者：

Haridas, Arul Valiyavalappil ^{[1
]}

Marimuthu, Ramalatha ^{[2
]}

Sivakumar, V. G. ^{[3
]}

Chakraborty, Basabi ^{[4
]}

机构：

[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India

[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India

[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India

[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan

来源：

INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING | 2021年 / 19卷 / 02期

关键词：

Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;

D O I：

10.1142/S021969132050071X

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.

引用

页数：25

共 50 条

[41] A MODEL STRUCTURE INTEGRATION BASED ON A BAYESIAN FRAMEWORK FOR SPEECH RECOGNITION
Shiota, Sayaka
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4813 - 4816
[42] Continuous Automatic Speech Recognition System using MapReduce Framework
Vikram, M.
Reddy, N. Sudhakar
Madhavi, K.
2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 80 - 83
[43] Associative memory framework for speech recognition: Adaptation of Hopfield network
Vaishnavi, Y.
Shreyas, R.
Suhas, S.
Surya, U. N.
Ladwani, Vandana M.
Ramasubramanian, V.
2016 IEEE ANNUAL INDIA CONFERENCE (INDICON), 2016,
[44] A Bayesian Framework Using Multiple Model Structures for Speech Recognition
Shiota, Sayaka
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04): : 939 - 948
[45] Framework for choosing a set of syllables and phonemes for Lithuanian speech recognition
Laurinciukaite, Sigita
Lipeika, Antanas
INFORMATICA, 2007, 18 (03) : 395 - 406
[46] Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model
Liu, Qi
Chen, Zhehuai
Li, Hao
Huang, Mingkun
Lu, Yizhou
Yu, Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2174 - 2183
[47] A New Method for Model Selection in Speech Recognition
Wu, Yahui
Liu, Gang
Guo, Jun
IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 165 - 168
[48] A New Algorithm for Speech and Gender Recognition on the Basis of Voiced Parts of Speech
Karwan, Jakub
Saeed, Khalid
COMPUTER INFORMATION SYSTEMS - ANALYSIS AND TECHNOLOGIES, 2011, 245 : 113 - 120
[49] A methodology for speech data analysis and a framework for adaptive speech recognition using fuzzy neural networks
Kasabov, N
Kozma, R
Kilgour, R
Laws, M
Taylor, J
Watts, M
Gray, A
PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1055 - 1060
[50] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Vich, Robert
Nouza, Jan
Vondra, Martin
VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +

← 1 2 3 4 5 →