Comparison of Various Neural Network Language Models in Speech Recognition

被引:1
|
作者
Zuo, Lingyun [1 ]
Liu, Jian [1 ,2 ]
Wan, Xin [3 ]
机构
[1] IACAS, Key Lab Speech Acoust & Content, Beijing, Peoples R China
[2] Chinese Acad Sci, XTIPC, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China
[3] Natl Comp Network Emergency Response Tech Team, Coordinat Ctr, Beijing, Peoples R China
关键词
neural network language model; LSTM; speech recognition; n-best lists re-score;
D O I
10.1109/ICISCE.2016.195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, research on language modeling for speech recognition has increasingly focused on the application of neural networks. However, the performance of neural network language models strongly depends on their architectural structure. Three competing concepts have been developed: Firstly, feed forward neural networks representing an n-gram approach; Secondly, recurrent neural networks that may learn context dependencies spanning more than a fixed number of predecessor words; Thirdly, the long short-term memory (LSTM) neural networks can fully exploits the correlation on a telephone conversation corpus. In this paper, we compare count models to feed forward, recurrent, and LSTM neural network in conversational telephone speech recognition tasks. Furthermore, we put forward a language model estimation method introduced the information of history sentences. We evaluate the models in terms of perplexity and word error rate, experimentally validating the strong correlation of the two quantities, which we find to hold regardless of the underlying type of the language model. The experimental results show that the performance of LSTM neural network language model is optimal in n-best lists rescore. Compared to the first pass decoding, the relative decline in average word error rate is 4.3% when using ten candidate results to re-score in conversational telephone speech recognition tasks.
引用
收藏
页码:894 / 898
页数:5
相关论文
共 50 条
  • [41] Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies
    Cui, Xiaodong
    Zhang, Wei
    Finkler, Ulrich
    Saon, George
    Picheny, Michael
    Kung, David
    IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 39 - 49
  • [42] Bag-of-Words Input for Long History Representation in Neural Network-based Language Models for Speech Recognition
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2371 - 2375
  • [43] Evaluation of Neural Network Language Models In Handwritten Chinese Text Recognition
    Wu, Yi-Chao
    Yin, Fei
    Liu, Cheng-Lin
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 166 - 170
  • [44] Neural network language models for off-line handwriting recognition
    Zamora-Martinez, F.
    Frinken, V.
    Espana-Boquera, S.
    Castro-Bleda, M. J.
    Fischer, A.
    Bunke, H.
    PATTERN RECOGNITION, 2014, 47 (04) : 1642 - 1652
  • [45] N-gram Language Models in JLASER Neural Network Speech Recognizer
    Konopik, Miloslav
    Habernal, Ivan
    Brychcin, Tomas
    2010 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS, 2010, : 167 - 170
  • [46] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [47] A COMPARISON BETWEEN DEEP NEURAL NETS AND KERNEL ACOUSTIC MODELS FOR SPEECH RECOGNITION
    Lu, Zhiyun
    Guo, Dong
    Garakani, Alireza Bagheri
    Liu, Kuan
    May, Avner
    Bellet, Aurelien
    Fan, Linxi
    Collins, Michael
    Kingsbury, Brian
    Picheny, Michael
    Sha, Fei
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5070 - 5074
  • [48] A comparison of different spectral analysis models for speech recognition using neural networks
    Zebulum, RS
    Vellasco, M
    Perelmuter, G
    Pacheco, MA
    PROCEEDINGS OF THE 39TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 1996, : 1428 - 1431
  • [49] Gaussian mixture language models for speech recognition
    Afify, Mohamed
    Siohan, Olivier
    Sarikaya, Ruhi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 29 - +
  • [50] Improving language models for radiology speech recognition
    Paulett, John M.
    Langlotz, Curtis P.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58