Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

被引:7
作者
Masumura, Ryo [1 ]
Asami, Taichi [1 ]
Oba, Takanobu [1 ]
Sakauchi, Sumitaka [1 ]
Ito, Akinori [2 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Yokosuka, Kanagawa 2390847, Japan
[2] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 9808579, Japan
关键词
latent words recurrent neural network language models; n-gram approximation; Viterbi approximation; automatic speech recognition; MIXTURE; GRAM;
D O I
10.1587/transinf.2018EDP7242
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
引用
收藏
页码:2557 / 2567
页数:11
相关论文
共 33 条
[1]  
[Anonymous], 1998, PROC BROADCAST NEWS
[2]  
[Anonymous], 2000, LREC
[3]  
Arisoy E., 2012, P NAACL HLT 2012 WOR, P20
[4]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[5]   An empirical study of smoothing techniques for language modeling [J].
Chen, SF ;
Goodman, J .
COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) :359-394
[6]   Dirichlet Class Language Models for Speech Recognition [J].
Chien, Jen-Tzung ;
Chueh, Chuang-Hua .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03) :482-495
[7]   The latent words language model [J].
Deschacht, Koen ;
De Belder, Jan ;
Moens, Marie-Francine .
COMPUTER SPEECH AND LANGUAGE, 2012, 26 (05) :384-409
[8]  
Goodman J, 2001, INT CONF ACOUST SPEE, P561, DOI 10.1109/ICASSP.2001.940893
[9]   A bit of progress in language modeling [J].
Goodman, JT .
COMPUTER SPEECH AND LANGUAGE, 2001, 15 (04) :403-434
[10]  
Hinton G., 2012, SIGNAL PROCESSING MA, P1