NEURAL LATTICE SEARCH FOR SPEECH RECOGNITION

被引:0
作者
Ma, Rao [1 ]
Li, Hao [1 ]
Liu, Qi [1 ]
Chen, Lu [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
speech recognition; word lattice; lattice-to-sequence; attention models; forward-backward algorithm;
D O I
10.1109/icassp40776.2020.9054109
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
To improve the accuracy of automatic speech recognition, a two-pass decoding strategy is widely adopted. The first-pass model generates compact word lattices, which are utilized by the second-pass model to perform rescoring. Currently, the most popular rescoring methods are N-best rescoring and lattice rescoring with long short-term memory language models (LSTMLMs). However, these methods encounter the problem of limited search space or inconsistency between training and evaluation. In this paper, we address these problems with an end-to-end model for accurately extracting the best hypothesis from the word lattice. Our model is composed of a bidirectional LatticeLSTM encoder followed by an attentional LSTM decoder. The model takes word lattice as input and generates the single best hypothesis from the given lattice space. When combined with an LSTMLM, the proposed model yields 9.7% and 7.5% relative WER reduction compared to N -best rescoring methods and lattice rescoring methods within the same amount of decoding time.
引用
收藏
页码:7794 / 7798
页数:5
相关论文
共 22 条
[1]  
[Anonymous], 2011, 2011 IEEE WORKSH AUT
[2]  
Auli M., 2013, P 2013 C EMP METH NA, P1044
[3]  
Buckman J., 2018, Transactions of the Association for Computational Linguistics, V6, P529, DOI [DOI 10.1162/TACLA00036, 10.1162/tacl_a_00036]
[4]   Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR [J].
Chen, Zhehuai ;
Jain, Mahaveer ;
Wang, Yongqiang ;
Seltzer, Michael L. ;
Fuegen, Christian .
INTERSPEECH 2019, 2019, :3490-3494
[5]  
Chiu CC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4774, DOI 10.1109/ICASSP.2018.8462105
[6]  
CHOW YL, 1989, SPEECH AND NATURAL LANGUAGE, P199
[7]  
Hemminger Robert L, 1983, Selected topics in graph theory
[8]  
Hoy Matthew B., 2018, Medical Reference Services Quarterly, V37, P81, DOI 10.1080/02763869.2018.1404391
[9]  
Kumar S, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P165, DOI 10.1109/ASRU.2017.8268931
[10]   LATTICERNN: Recurrent Neural Networks over Lattices [J].
Ladhak, Faisal ;
Gandhe, Ankur ;
Dreyer, Markus ;
Mathias, Lambert ;
Rastrow, Ariya ;
Hoffineister, Bjorn .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :695-699