A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism

被引:1
作者
Hao, Qiuyun [1 ]
Wang, FuQiang [1 ]
Ma, XiaoFeng [1 ]
Zhang, Peng [1 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Prov Key Lab Co, Jinan, Peoples R China
来源
2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021) | 2021年
关键词
RNN; LSTM; Hidden Markov Model(HMM); Attention Mechanism; Speech Recognition; Isolated Word;
D O I
10.1109/CISP-BMEI53629.2021.9624368
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
The speech recognition technology of isolated words is one of the widely used speech recognition technologies at present. The isolated Words Speech Recognition technology for speaker-independent is selected as the research object of the paper. And the corresponding speech database of Chinese isolated words is established. Through the research and experiment of existing speech recognition algorithms for Chinese isolated words, a model of Chinese isolated words speech recognition algorithm which based on recurrent neural networks (RNN), long short-term memory (LSTM) and attention mechanism is proposed. Attention mechanism is introduced by the algorithm model, which is used for weight adjustment, training and optimizing parameter for the combined model. The experimental results show that, compared with other algorithm models, the proposed algorithm model which based on RNN-LSTM and attention mechanism has better recognition performance for Chinese isolated words, and can effectively improve the speech recognition efficiency of Chinese isolated words for speaker-independent.
引用
收藏
页数:4
相关论文
共 28 条
[1]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[2]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[3]  
Chorowski J, 2015, ADV NEUR IN, V28
[4]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[5]   ANALYSIS OF THE CORRELATION STRUCTURE FOR A NEURAL PREDICTIVE MODEL WITH APPLICATION TO SPEECH RECOGNITION [J].
DENG, L ;
HASSANEIN, K ;
ELMASRY, M .
NEURAL NETWORKS, 1994, 7 (02) :331-339
[6]  
Ferguson J. D., 1980, Application of hidden Markov models to text and speech
[7]  
Graves A.B., 2016, U.S. Patent, Patent No. [9,263,036, 9263036]
[8]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[9]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[10]  
Hochreiter S., 1997, Neural Computation, V9, P1735