Comparison of Various Neural Network Language Models in Speech Recognition

被引:1
|
作者
Zuo, Lingyun [1 ]
Liu, Jian [1 ,2 ]
Wan, Xin [3 ]
机构
[1] IACAS, Key Lab Speech Acoust & Content, Beijing, Peoples R China
[2] Chinese Acad Sci, XTIPC, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China
[3] Natl Comp Network Emergency Response Tech Team, Coordinat Ctr, Beijing, Peoples R China
关键词
neural network language model; LSTM; speech recognition; n-best lists re-score;
D O I
10.1109/ICISCE.2016.195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, research on language modeling for speech recognition has increasingly focused on the application of neural networks. However, the performance of neural network language models strongly depends on their architectural structure. Three competing concepts have been developed: Firstly, feed forward neural networks representing an n-gram approach; Secondly, recurrent neural networks that may learn context dependencies spanning more than a fixed number of predecessor words; Thirdly, the long short-term memory (LSTM) neural networks can fully exploits the correlation on a telephone conversation corpus. In this paper, we compare count models to feed forward, recurrent, and LSTM neural network in conversational telephone speech recognition tasks. Furthermore, we put forward a language model estimation method introduced the information of history sentences. We evaluate the models in terms of perplexity and word error rate, experimentally validating the strong correlation of the two quantities, which we find to hold regardless of the underlying type of the language model. The experimental results show that the performance of LSTM neural network language model is optimal in n-best lists rescore. Compared to the first pass decoding, the relative decline in average word error rate is 4.3% when using ten candidate results to re-score in conversational telephone speech recognition tasks.
引用
收藏
页码:894 / 898
页数:5
相关论文
共 50 条
  • [21] Neural Error Corrective Language Models for Automatic Speech Recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Masataki, Hirokazu
    Aono, Yushi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 401 - 405
  • [22] Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
    Lecorve, Gwenole
    Motlicek, Petr
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1666 - 1669
  • [23] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
    Gong, Caixia
    Li, Xiangang
    Wu, Xihong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463
  • [24] Temporal Speech Normalization Methods Comparison in Speech Recognition Using Neural Network
    Salam, Md Sah Bin Hj
    Mohamad, Dzulkifli
    Salleh, Sheikh Hussain Shaikh
    2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 442 - 447
  • [25] Cross-sentence Neural Language Models for Conversational Speech Recognition
    Chiu, Shih-Hsuan
    Lo, Tien-Hong
    Chen, Berlin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [26] Speech Recognition Model for Assamese Language Using Deep Neural Network
    Singh, Moirangthem Tiken
    Barman, Partha Pratim
    Gogoi, Rupjyoti
    2018 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN ELECTRICAL, ELECTRONICS & COMMUNICATION ENGINEERING (ICRIEECE 2018), 2018, : 2722 - 2727
  • [27] A Speech Recognition System for Bengali Language using Recurrent Neural Network
    Islam, Jahirul
    Mubassira, Masiath
    Islam, Md. Rakibul
    Das, Amit Kumar
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 73 - 76
  • [28] Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
    Li, Ke
    Xu, Hainan
    Wang, Yiming
    Povey, Daniel
    Khudanpur, Sanjeev
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3373 - 3377
  • [29] COMPARISON OF FEEDFORWARD AND RECURRENT NEURAL NETWORK LANGUAGE MODELS
    Sundermeyer, M.
    Oparin, I.
    Gauvain, J. -L.
    Freiberg, B.
    Schlueter, R.
    Ney, H.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8430 - 8434
  • [30] LEARNING RECURRENT NEURAL NETWORK LANGUAGE MODELS WITH CONTEXT-SENSITIVE LABEL SMOOTHING FOR AUTOMATIC SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    Han, Mei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6159 - 6163