Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

被引：7

作者：

Masumura, Ryo ^{[1
]}

Asami, Taichi ^{[1
]}

Oba, Takanobu ^{[1
]}

Sakauchi, Sumitaka ^{[1
]}

Ito, Akinori ^{[2
]}

机构：

[1] NTT Corp, NTT Media Intelligence Labs, Yokosuka, Kanagawa 2390847, Japan

[2] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 9808579, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2019年 / E102D卷 / 12期

关键词：

latent words recurrent neural network language models; n-gram approximation; Viterbi approximation; automatic speech recognition; MIXTURE; GRAM;

D O I：

10.1587/transinf.2018EDP7242

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.

引用

页码：2557 / 2567

页数：11

共 50 条

[1] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Arisoy, Ebru
Sethy, Abhinav
Ramabhadran, Bhuvana
Chen, Stanley
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
[2] Latent Words Recurrent Neural Network Language Models
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Masataki, Hirokazu
Sakauchi, Sumitaka
Ito, Akinori
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2380 - 2384
[3] Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition
Chen, Xie
Liu, Xunying
Wang, Yongqiang
Gales, Mark J. F.
Woodland, Philip C.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2146 - 2157
[4] gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Masataki, Hirokazu
Sakauchi, Sumitaka
Takahashi, Satoshi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2462 - 2470
[5] Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Masataki, Hirokazu
Sakauchi, Sumitaka
Ito, Akinori
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (06): : 1581 - 1590
[6] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
Chen, X.
Ragni, A.
Liu, X.
Gales, M. J. F.
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
[7] Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
Lecorve, Gwenole
Motlicek, Petr
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1666 - 1669
[8] LEARNING RECURRENT NEURAL NETWORK LANGUAGE MODELS WITH CONTEXT-SENSITIVE LABEL SMOOTHING FOR AUTOMATIC SPEECH RECOGNITION
Song, Minguang
Zhao, Yunxin
Wang, Shaojun
Han, Mei
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6159 - 6163
[9] GAUSSIAN PROCESS LSTM RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
Lam, Max W. Y.
Chen, Xie
Hu, Shoukang
Yu, Jianwei
Liu, Xunying
Meng, Helen
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7235 - 7239
[10] SEMANTIC WORD EMBEDDING NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Audhkhasi, Kartik
Sethy, Abhinav
Ramabhadran, Bhuvana
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5995 - 5999

← 1 2 3 4 5 →