Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

被引:7
|
作者
Masumura, Ryo [1 ]
Asami, Taichi [1 ]
Oba, Takanobu [1 ]
Sakauchi, Sumitaka [1 ]
Ito, Akinori [2 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Yokosuka, Kanagawa 2390847, Japan
[2] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 9808579, Japan
关键词
latent words recurrent neural network language models; n-gram approximation; Viterbi approximation; automatic speech recognition; MIXTURE; GRAM;
D O I
10.1587/transinf.2018EDP7242
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
引用
收藏
页码:2557 / 2567
页数:11
相关论文
共 50 条
  • [1] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Arisoy, Ebru
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Chen, Stanley
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
  • [2] Latent Words Recurrent Neural Network Language Models
    Masumura, Ryo
    Asami, Taichi
    Oba, Takanobu
    Masataki, Hirokazu
    Sakauchi, Sumitaka
    Ito, Akinori
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2380 - 2384
  • [3] Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition
    Chen, Xie
    Liu, Xunying
    Wang, Yongqiang
    Gales, Mark J. F.
    Woodland, Philip C.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2146 - 2157
  • [4] gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition
    Masumura, Ryo
    Asami, Taichi
    Oba, Takanobu
    Masataki, Hirokazu
    Sakauchi, Sumitaka
    Takahashi, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2462 - 2470
  • [5] Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition
    Masumura, Ryo
    Asami, Taichi
    Oba, Takanobu
    Masataki, Hirokazu
    Sakauchi, Sumitaka
    Ito, Akinori
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (06): : 1581 - 1590
  • [6] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
    Chen, X.
    Ragni, A.
    Liu, X.
    Gales, M. J. F.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
  • [7] Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
    Lecorve, Gwenole
    Motlicek, Petr
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1666 - 1669
  • [8] LEARNING RECURRENT NEURAL NETWORK LANGUAGE MODELS WITH CONTEXT-SENSITIVE LABEL SMOOTHING FOR AUTOMATIC SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    Han, Mei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6159 - 6163
  • [9] GAUSSIAN PROCESS LSTM RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
    Lam, Max W. Y.
    Chen, Xie
    Hu, Shoukang
    Yu, Jianwei
    Liu, Xunying
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7235 - 7239
  • [10] SEMANTIC WORD EMBEDDING NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Audhkhasi, Kartik
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5995 - 5999