Taylor-DBN: A new framework for speech recognition systems

被引：0

作者：

Haridas, Arul Valiyavalappil ^{[1
]}

Marimuthu, Ramalatha ^{[2
]}

Sivakumar, V. G. ^{[3
]}

Chakraborty, Basabi ^{[4
]}

机构：

[1] Sathyabama Inst Sci & Technol, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India

[2] Kumaraguru Coll Technol, Dept Elect & Commun Engn, Coimbatore 641049, Tamil Nadu, India

[3] Vidya Jyothi Inst Technol, Dept Elect & Commun Engn, Hyderabad, Telangana, India

[4] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan

来源：

INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING | 2021年 / 19卷 / 02期

关键词：

Speech recognition; Taylor series; deep belief network; gradient descent algorithm; spectral kurtosis; spectral skewness; NEURAL-NETWORKS;

D O I：

10.1142/S021969132050071X

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speech recognition is a rapidly emerging research area as the speech signal contains linguistic information and speaker information that can be used in applications including surveillance, authentication, and forensic field. The performance of speech recognition systems degrades expeditiously nowadays due to channel degradations, mismatches, and noise. To provide better performance of speech recognition, the Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent (GD) algorithm with Taylor series in the existing DBN classifier. Initially, the noise present in the speech signal is removed through the speech signal enhancement. The features, such as Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS), spectral kurtosis, and spectral skewness, are extracted from the enhanced speech signal, which is fed to the Taylor-DBN classifier that identifies the speech of the impaired persons. The experimentation is done using the TensorFlow speech recognition database, the real database, and the ESC-50 dataset. The accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR), and Mean Square Error (MSE) of the Taylor-DBN for TensorFlow speech recognition database are 96.95%, 3.04%, 3.04%, and 0.045, respectively, and for real database, the accuracy, FAR, FRR, and MSE are 96.67%, 3.32%, 3.32%, and 0.0499, respectively. Similarly, for the ESC-50 dataset, the accuracy, FAR, FRR, and MSE are 96.81%, 3.18%, 3.18%, and 0.047, respectively. The results imply that the Taylor-DBN provides better performance as compared to the existing conventional methods.

引用

页数：25

共 50 条

[21] Estimation of Speech Intelligibility Using Speech Recognition Systems
Takano, Yusuke
Kondo, Kazuhiro
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (12): : 3368 - 3376
[22] Support software for Automatic Speech Recognition systems targeted for non-native speech
Radzikowski, Kacper
Yoshie, Osamu
Nowak, Robert
22ND INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2020), 2020, : 55 - 61
[23] Speech recognition using Taylor-gradient Descent political optimization based Deep residual network
Arul, V. H.
Marimuthu, Ramalatha
COMPUTER SPEECH AND LANGUAGE, 2023, 78
[24] USING A* FOR THE PARALLELIZATION OF SPEECH RECOGNITION SYSTEMS
Cardinal, Patrick
Boulianne, Gilles
Dumouchel, Pierre
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4433 - 4436
[25] EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
Gerczuk, Maurice
Amiriparian, Shahin
Ottl, Sandra
Schuller, Bjorn W. W.
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1472 - 1487
[26] Robust Speech Recognition using Generalized Distillation Framework
Markov, Konstantin
Matsui, Tomoko
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2364 - 2368
[27] Exploration of an Independent Training Framework for Speech Emotion Recognition
Zhong, Shunming
Yu, Baoxian
Zhang, Han
IEEE ACCESS, 2020, 8 : 222533 - 222543
[28] Lightweight Deep Learning Framework for Speech Emotion Recognition
Akinpelu, Samson
Viriri, Serestina
Adegun, Adekanmi
IEEE ACCESS, 2023, 11 : 77086 - 77098
[29] Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems
Aida-Zade, K. R.
Ardil, C.
Rustamov, S. S.
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 13, 2006, 13 : 275 - +
[30] A survey on automatic speech recognition systems for Portuguese language and its variations
de Lima, Thales Aguiar
Da Costa-Abreu, Marjory
COMPUTER SPEECH AND LANGUAGE, 2020, 62 (62)

← 1 2 3 4 5 →