Taylor-DBN: A new framework for speech recognition systems

被引：0

作者：

Haridas, Arul Valiyavalappil ^{[1
]}

Marimuthu, Ramalatha ^{[2
]}

Sivakumar, V. G. ^{[3
]}

Chakraborty, Basabi ^{[4
]}

机构：

[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India

[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India

[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India

[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan

来源：

INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING | 2021年 / 19卷 / 02期

关键词：

Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;

D O I：

10.1142/S021969132050071X

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.

引用

页数：25

共 50 条

[1] An asynchronous DBN for audio-visual speech recognition
Saenko, Kate
Livescu, Karen
2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +
[2] An Optimization of DBN/GPU Speech Recognition on Wireless Network Applications
Jing, Weipeng
Jiang, Tao
Liu, Yaqiu
WIRELESS INTERNET (WICON 2016), 2018, 214 : 189 - 196
[3] LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION WITH CONTEXT-DEPENDENT DBN-HMMS
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4688 - 4691
[4] Automatic speech recognition systems
Catariov, A
Information Technologies 2004, 2004, 5822 : 83 - 93
[5] Single stream DBN model for continuous speech recognition and phone segmentation
Lv, Guoyun
Jiang, Dongmei
Guo, Pengjuan
Sun, Ali
Zhao, Rongchun
Sahli, H.
Verhelst, W.
DCABES 2006 Proceedings, Vols 1 and 2, 2006, : 277 - 280
[6] A framework for secure speech recognition
Smaragdis, Paris
Shashanka, Madhusudana V. S.
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 969 - +
[7] A Framework for Speech Recognition Benchmarking
Dernoncourt, Franck
Trung Bui
Chang, Walter
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 169 - 170
[8] A framework for secure speech recognition
Smaragdis, Paris
Shashanka, Madhusudana
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1404 - 1413
[9] SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems
Chen, Huili
Darvish, Bita
Koushanfar, Farinaz
INTERSPEECH 2020, 2020, : 2312 - 2316
[10] A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
Lin, Yi
Guo, Dongyue
Zhang, Jianwei
Chen, Zhengmao
Yang, Bo
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3608 - 3620

← 1 2 3 4 5 →