Comparison of Various Neural Network Language Models in Speech Recognition

被引:1
|
作者
Zuo, Lingyun [1 ]
Liu, Jian [1 ,2 ]
Wan, Xin [3 ]
机构
[1] IACAS, Key Lab Speech Acoust & Content, Beijing, Peoples R China
[2] Chinese Acad Sci, XTIPC, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China
[3] Natl Comp Network Emergency Response Tech Team, Coordinat Ctr, Beijing, Peoples R China
关键词
neural network language model; LSTM; speech recognition; n-best lists re-score;
D O I
10.1109/ICISCE.2016.195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, research on language modeling for speech recognition has increasingly focused on the application of neural networks. However, the performance of neural network language models strongly depends on their architectural structure. Three competing concepts have been developed: Firstly, feed forward neural networks representing an n-gram approach; Secondly, recurrent neural networks that may learn context dependencies spanning more than a fixed number of predecessor words; Thirdly, the long short-term memory (LSTM) neural networks can fully exploits the correlation on a telephone conversation corpus. In this paper, we compare count models to feed forward, recurrent, and LSTM neural network in conversational telephone speech recognition tasks. Furthermore, we put forward a language model estimation method introduced the information of history sentences. We evaluate the models in terms of perplexity and word error rate, experimentally validating the strong correlation of the two quantities, which we find to hold regardless of the underlying type of the language model. The experimental results show that the performance of LSTM neural network language model is optimal in n-best lists rescore. Compared to the first pass decoding, the relative decline in average word error rate is 4.3% when using ten candidate results to re-score in conversational telephone speech recognition tasks.
引用
收藏
页码:894 / 898
页数:5
相关论文
共 50 条
  • [1] Comparison of Neural Network Models for Speech Emotion Recognition
    Palo, Hemanta Kumar
    Sagar, Sangeet
    2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA 2018), 2018, : 127 - 131
  • [2] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
    Chen, X.
    Ragni, A.
    Liu, X.
    Gales, M. J. F.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
  • [3] Structured Output Layer Neural Network Language Models for Speech Recognition
    Le, Hai-Son
    Oparin, Ilya
    Allauzen, Alexandre
    Gauvain, Jean-Luc
    Yvon, Francois
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (01): : 195 - 204
  • [4] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Arisoy, Ebru
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Chen, Stanley
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
  • [5] Empirical study of neural network language models for Arabic speech recognition
    Emami, Ahmad
    Mangu, Lidia
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 147 - 152
  • [6] Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition
    Chen, Xie
    Liu, Xunying
    Wang, Yu
    Ragni, Anton
    Wong, Jeremy H. M.
    Gales, Mark J. F.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1444 - 1454
  • [7] SEMANTIC WORD EMBEDDING NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Audhkhasi, Kartik
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5995 - 5999
  • [8] Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition
    Masumura, Ryo
    Asami, Taichi
    Oba, Takanobu
    Sakauchi, Sumitaka
    Ito, Akinori
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2557 - 2567
  • [9] GAUSSIAN PROCESS LSTM RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
    Lam, Max W. Y.
    Chen, Xie
    Hu, Shoukang
    Yu, Jianwei
    Liu, Xunying
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7235 - 7239
  • [10] A study of neural network Russian language models for automatic continuous speech recognition systems
    Kipyatkova, I. S.
    Karpov, A. A.
    AUTOMATION AND REMOTE CONTROL, 2017, 78 (05) : 858 - 867