Taylor-DBN: A new framework for speech recognition systems

被引：0

作者：

Haridas, Arul Valiyavalappil ^{[1
]}

Marimuthu, Ramalatha ^{[2
]}

Sivakumar, V. G. ^{[3
]}

Chakraborty, Basabi ^{[4
]}

机构：

[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India

[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India

[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India

[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan

来源：

INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING | 2021年 / 19卷 / 02期

关键词：

Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;

D O I：

10.1142/S021969132050071X

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.

引用

页数：25

共 50 条

[31] Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems
Basak, Sneha
Agrawal, Himanshi
Jena, Shreya
Gite, Shilpa
Bachute, Mrinal
Pradhan, Biswajeet
Assiri, Andmazen
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (02): : 1053 - 1089
[32] A new noisy speech recognition method
Zhao, XQ
Wang, J
International Symposium on Communications and Information Technologies 2005, Vols 1 and 2, Proceedings, 2005, : 282 - 286
[33] Towards mixed language speech recognition systems
Imseng, David
Bourlard, Herve
Magimai-Doss, Mathew
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 278 - 281
[34] DEVELOPMENT OF NEW SPEECH CORPUS FOR ELDERLY JAPANESE SPEECH RECOGNITION
Iribe, Yurie
Kitaoka, Norihide
Segawa, Shuhei
2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 27 - 31
[35] Universal Adversarial Perturbations for Speech Recognition Systems
Neekhara, Paarth
Hussain, Shehzeen
Pandey, Prakhar
Dubnov, Shlomo
McAuley, Julian
Koushanfar, Farinaz
INTERSPEECH 2019, 2019, : 481 - 485
[36] MODERN TRENDS IN THE DEVELOPMENT OF SPEECH RECOGNITION SYSTEMS
Mamyrbayev, O.
Oralbekova, D.
NEWS OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN-SERIES PHYSICO-MATHEMATICAL, 2020, 4 (332): : 42 - 51
[37] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
Mengistu, Kinfe Tadesse
Rudzicz, Frank
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
[38] Phoneme fuzzy characterization in speech recognition systems
Beritelli, F
Borrometi, L
Cuce, A
APPLICATIONS OF SOFT COMPUTING, 1997, 3165 : 305 - 306
[39] A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition
Hazmoune, Samira
Bougamouza, Fateh
Mazouzi, Smaine
Benmohammed, Mohamed
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (03) : 689 - 704
[40] A hierarchical duration model for speech recognition based on the ANGIE framework
Chung, GY
Seneff, S
SPEECH COMMUNICATION, 1999, 27 (02) : 113 - 134

← 1 2 3 4 5 →