Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition

被引:38
|
作者
Li, Ke [1 ]
Xu, Hainan [1 ]
Wang, Yiming [1 ]
Povey, Daniel [1 ,2 ]
Khudanpur, Sanjeev [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD USA
关键词
ASR; recurrent neural network language model (RNNLM); neural language model adaptation; fast marginal adaptation (FMA); cache model; deep neural network (DNN); lattice rescoring;
D O I
10.21437/Interspeech.2018-1413
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose two adaptation models for recurrent neural network language models (RNNLMs) to capture topic effects and long-distance triggers for conversational automatic speech recognition (ASR). We use a fast marginal adaptation (FMA) framework to adapt a RNNLM. Our first model is effectively a cache model the word frequencies are estimated by counting words in a conversation (with utterance-level hold-one-out) from 1st pass decoded word lattices, and then is interpolated with a background unigram distribution. In the second model, we train a deep neural network (DNN) on conversational transcriptions to predict word frequencies given word frequencies from 1st pass decoded word lattices. The second model can in principle model trigger and topic effects but is harder to train. Experiments on three conversational corpora show modest WER and perplexity reductions with both adaptation models.
引用
收藏
页码:3373 / 3377
页数:5
相关论文
共 50 条
  • [1] Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition
    Chen, X.
    Tan, T.
    Liu, X.
    Lanchantin, P.
    Wan, M.
    Gales, M. J. F.
    Woodland, P. C.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3511 - 3515
  • [2] Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment
    Deena, Salil
    Hasan, Madina
    Doulaty, Mortaza
    Saz, Oscar
    Hain, Thomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 572 - 582
  • [3] Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network based Language Model Adaptation
    Ni, Hao
    Yi, Jiangyan
    Wen, Zhengqi
    Liu, Bin
    Tao, Jianhua
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [4] Chameleon: A Language Model Adaptation Toolkit for Automatic Speech Recognition of Conversational Speech
    Song, Yuanfeng
    Jiang, Di
    Zhao, Weiwei
    Xu, Qian
    Wong, Raymond Chi-Wing
    Yang, Qiang
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2019, : 37 - 42
  • [5] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
    Gong, Caixia
    Li, Xiangang
    Wu, Xihong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463
  • [6] A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS
    Cho, Eunjoon
    Kumar, Shankar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5784 - 5788
  • [7] RECURRENT NEURAL NETWORK LANGUAGE MODELING FOR CODE SWITCHING CONVERSATIONAL SPEECH
    Adel, Heike
    Ngoc Thang Vu
    Kraus, Franziska
    Schlippe, Tim
    Li, Haizhou
    Schultz, Tanja
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8411 - 8415
  • [8] Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech
    Ni, Hao
    Yi, Jiangyan
    Wen, Zhengqi
    Tao, Jianhua
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 607 - 617
  • [9] Integrating Prosodic Information into Recurrent Neural Network Language Model For Speech Recognition
    Fu, Tong
    Han, Yang
    Li, Xiangang
    Liu, Yi
    Wu, Xihong
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1194 - 1197
  • [10] RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH STRUCTURED WORD EMBEDDINGS FOR SPEECH RECOGNITION
    He, Tianxing
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5396 - 5400