Performance Optimization of Speech Recognition System with Deep Neural Network Model

被引：0

作者：

Wei Guan ^{[1
]}

机构：

[1] College of Modern Science and Technology, China Jiliang University, HangzhouZhejiang

来源：

Optical Memory and Neural Networks | 2018年 / 27卷 / 4期

关键词：

acoustic model; deep neural network; discriminative training; performance optimization; speech recognition;

D O I：

10.3103/S1060992X18040094

中图分类号：

学科分类号：

摘要：

Abstract: With the development of internet, man-machine interaction has tended to be more important. Precise speech recognition has become an important means to achieve man-machine interaction. In this study, deep neural network model was used to enhance speech recognition performance. Feedforward fully connected deep neural network, time-delay neural network, convolutional neural network and feedforward sequence memory neural network were studied, and their speech recognition performance was studied by comparing their acoustic models. Moreover, the recognition performance of the model after adding different dimension human voice features was tested. The results showed that the performance of the speech recognition system could be improved effectively by using the deep neural network model, and the performance of feedforward sequence memory neural network was the best, followed by deep neural network, time-delay neural network and convolutional neural network. Different extraction features had different improvement effects on model performance. The performance of the model which was added with Fbank extraction features was superior to that added with Mel-frequency cepstrum coefficient (MFCC) extraction feature. The model performance improved after the addition of vocal characteristics. Different models had different vocal characteristic dimensions. © 2018, Allerton Press, Inc.

引用

页码：272 / 282

页数：10

共 50 条

[21] Deep neural network architectures for dysarthric speech analysis and recognition
Zaidi, Brahim Fares
Selouani, Sid Ahmed
Boudraa, Malika
Sidi Yakoub, Mohammed
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15) : 9089 - 9108
[22] TOWARDS STRUCTURED DEEP NEURAL NETWORK FOR AUTOMATIC SPEECH RECOGNITION
Liao, Yi-Hsiu
Lee, Hung-yi
Lee, Lin-shan
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 137 - 144
[23] Neural network optimization using genetic algorithms for speech recognition
Mouria-Beji, F
ENGINEERING INTELLIGENT SYSTEMS FOR ELECTRICAL ENGINEERING AND COMMUNICATIONS, 2002, 10 (02): : 69 - 74
[24] A Fuzzy Neural Network Applied in the Speech Recognition System
Zhang, Xueying
Wang, Peng
Li, Gaoyun
Hou, Wenjun
ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 3, PROCEEDINGS, 2008, : 14 - +
[25] Neural Network Phone Duration Model for Speech Recognition
Alumae, Tanel
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1204 - 1208
[26] Deep neural network with attention model for scene text recognition
Li, Shuohao
Tang, Min
Guo, Qiang
Lei, Jun
Zhang, Jun
IET COMPUTER VISION, 2017, 11 (07) : 605 - 612
[27] Performance Evaluation of an Accessory Category Recognition System Using Deep Neural Network
Sakai, Yuki
Oda, Tetsuya
Ikeda, Makoto
Barolli, Leonard
PROCEEDINGS OF 2016 19TH INTERNATIONAL CONFERENCE ON NETWORK-BASED INFORMATION SYSTEMS (NBIS), 2016, : 437 - 441
[28] Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition
Zhang, Hua
Gou, Ruoyun
Shang, Jili
Shen, Fangyao
Wu, Yifan
Dai, Guojun
FRONTIERS IN PHYSIOLOGY, 2021, 12
[29] NEW TYPES OF DEEP NEURAL NETWORK LEARNING FOR SPEECH RECOGNITION AND RELATED APPLICATIONS: AN OVERVIEW
Deng, Li
Hinton, Geoffrey
Kingsbury, Brian
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8599 - 8603
[30] LOCAL TRAJECTORY BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION WITH DEEP NEURAL NETWORK
You, Yongbin
Qian, Yanmin
Yu, Kai
2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 5 - 9

← 1 2 3 4 5 →