Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition

被引：38

作者：

Li, Ke ^{[1
]}

Xu, Hainan ^{[1
]}

Wang, Yiming ^{[1
]}

Povey, Daniel ^{[1
,2
]}

Khudanpur, Sanjeev ^{[1
,2
]}

机构：

[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

ASR; recurrent neural network language model (RNNLM); neural language model adaptation; fast marginal adaptation (FMA); cache model; deep neural network (DNN); lattice rescoring;

D O I：

10.21437/Interspeech.2018-1413

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose two adaptation models for recurrent neural network language models (RNNLMs) to capture topic effects and long-distance triggers for conversational automatic speech recognition (ASR). We use a fast marginal adaptation (FMA) framework to adapt a RNNLM. Our first model is effectively a cache model the word frequencies are estimated by counting words in a conversation (with utterance-level hold-one-out) from 1st pass decoded word lattices, and then is interpolated with a background unigram distribution. In the second model, we train a deep neural network (DNN) on conversational transcriptions to predict word frequencies given word frequencies from 1st pass decoded word lattices. The second model can in principle model trigger and topic effects but is harder to train. Experiments on three conversational corpora show modest WER and perplexity reductions with both adaptation models.

引用

页码：3373 / 3377

页数：5

共 50 条

[1] Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition
Chen, X.
Tan, T.
Liu, X.
Lanchantin, P.
Wan, M.
Gales, M. J. F.
Woodland, P. C.
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3511 - 3515
[2] Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment
Deena, Salil
Hasan, Madina
Doulaty, Mortaza
Saz, Oscar
Hain, Thomas
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 572 - 582
[3] Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network based Language Model Adaptation
Ni, Hao
Yi, Jiangyan
Wen, Zhengqi
Liu, Bin
Tao, Jianhua
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[4] Chameleon: A Language Model Adaptation Toolkit for Automatic Speech Recognition of Conversational Speech
Song, Yuanfeng
Jiang, Di
Zhao, Weiwei
Xu, Qian
Wong, Raymond Chi-Wing
Yang, Qiang
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2019, : 37 - 42
[5] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
Gong, Caixia
Li, Xiangang
Wu, Xihong
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463
[6] A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS
Cho, Eunjoon
Kumar, Shankar
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5784 - 5788
[7] RECURRENT NEURAL NETWORK LANGUAGE MODELING FOR CODE SWITCHING CONVERSATIONAL SPEECH
Adel, Heike
Ngoc Thang Vu
Kraus, Franziska
Schlippe, Tim
Li, Haizhou
Schultz, Tanja
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8411 - 8415
[8] Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech
Ni, Hao
Yi, Jiangyan
Wen, Zhengqi
Tao, Jianhua
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 607 - 617
[9] Integrating Prosodic Information into Recurrent Neural Network Language Model For Speech Recognition
Fu, Tong
Han, Yang
Li, Xiangang
Liu, Yi
Wu, Xihong
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1194 - 1197
[10] RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH STRUCTURED WORD EMBEDDINGS FOR SPEECH RECOGNITION
He, Tianxing
Xiang, Xu
Qian, Yanmin
Yu, Kai
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5396 - 5400

← 1 2 3 4 5 →