Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition

被引:10
|
作者
Tueske, Zoltan [1 ,2 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52056 Aachen, Germany
[2] IBM Res, Thomas J Watson Res Ctr, POB 704, Yorktown Hts, NY 10598 USA
基金
欧洲研究理事会;
关键词
speech recognition; language-modeling; LSTM; n-gram; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2018-2476
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural networks (NN) with long short-term memory (LSTM) are the current state of the art to model long term dependencies. However, recent studies indicate that NN language models (LM) need only limited length of history to achieve excellent performance. In this paper, we extend the previous investigation on LSTM network based n-gram modeling to the domain of automatic speech recognition (ASR). First, applying recent optimization techniques and up to 6-layer LSTM networks, we improve LM perplexities by nearly 50% relative compared to classic count models on three different domains. Then, we demonstrate by experimental results that perplexities improve significantly only up to 40-grams when limiting the LM history. Nevertheless, the ASR performance saturates already around 20-grams despite across sentence modeling. Analysis indicates that the performance gain of LSTM NNLM over count models results only partially from the longer context and cross sentence modeling capabilities. Using equal context, we show that deep 4-gram LSTM can significantly outperform large interpolated count models by performing the backing off and smoothing significantly better. This observation also underlines the decreasing importance to combine state-of-the-art deep NNLM with count based model.
引用
收藏
页码:3358 / 3362
页数:5
相关论文
共 50 条
  • [21] Joint-Character-POC N-Gram Language Modeling For Chinese Speech Recognition
    Wang, Bin
    Ou, Zhijian
    Li, Jian
    Kawamura, Akinori
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 24 - +
  • [22] INVESTIGATION OF BACK-OFF BASED INTERPOLATION BETWEEN RECURRENT NEURAL NETWORK AND N-GRAM LANGUAGE MODELS
    Chen, X.
    Liu, X.
    Gales, M. J. F.
    Woodland, P. C.
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 181 - 186
  • [23] Class n-Gram Models for Very Large Vocabulary Speech Recognition of Finnish and Estonian
    Varjokallio, Matti
    Kurimo, Mikko
    Virpioja, Sami
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 133 - 144
  • [24] GAUSSIAN PROCESS LSTM RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
    Lam, Max W. Y.
    Chen, Xie
    Hu, Shoukang
    Yu, Jianwei
    Liu, Xunying
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7235 - 7239
  • [25] Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models
    Yang, Yiben
    Wang, Ji-Ping
    Downey, Doug
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3268 - 3273
  • [26] N-gram Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech for Speech Recognition
    Hatami, Ali
    Akbari, Ahmad
    Nasersharif, Babak
    2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,
  • [27] Profile based compression of n-gram language models
    Olsen, Jesper
    Oria, Daniela
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1041 - 1044
  • [28] Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model
    Tian, Jinchuan
    Yu, Jianwei
    Weng, Chao
    Zou, Yuexian
    Yu, Dong
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 812 - 816
  • [29] Efficient MDI Adaptation for n-gram Language Models
    Huang, Ruizhe
    Li, Ke
    Arora, Ashish
    Povey, Daniel
    Khudanpur, Sanjeev
    INTERSPEECH 2020, 2020, : 4916 - 4920
  • [30] N-gram language models for massively parallel devices
    Bogoychev, Nikolay
    Lopez, Adam
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1944 - 1953