Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition

被引:10
|
作者
Tueske, Zoltan [1 ,2 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52056 Aachen, Germany
[2] IBM Res, Thomas J Watson Res Ctr, POB 704, Yorktown Hts, NY 10598 USA
基金
欧洲研究理事会;
关键词
speech recognition; language-modeling; LSTM; n-gram; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2018-2476
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural networks (NN) with long short-term memory (LSTM) are the current state of the art to model long term dependencies. However, recent studies indicate that NN language models (LM) need only limited length of history to achieve excellent performance. In this paper, we extend the previous investigation on LSTM network based n-gram modeling to the domain of automatic speech recognition (ASR). First, applying recent optimization techniques and up to 6-layer LSTM networks, we improve LM perplexities by nearly 50% relative compared to classic count models on three different domains. Then, we demonstrate by experimental results that perplexities improve significantly only up to 40-grams when limiting the LM history. Nevertheless, the ASR performance saturates already around 20-grams despite across sentence modeling. Analysis indicates that the performance gain of LSTM NNLM over count models results only partially from the longer context and cross sentence modeling capabilities. Using equal context, we show that deep 4-gram LSTM can significantly outperform large interpolated count models by performing the backing off and smoothing significantly better. This observation also underlines the decreasing importance to combine state-of-the-art deep NNLM with count based model.
引用
收藏
页码:3358 / 3362
页数:5
相关论文
共 50 条
  • [1] N-gram Approximation of LSTM Recurrent Language Models for Single-pass Recognition of Hungarian Call Center Conversations
    Tarjan, Balazs
    Szaszak, Gyorgy
    Fegyo, Tibor
    Mihajlik, Peter
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2019), 2019, : 131 - 136
  • [2] Improved N-gram Phonotactic Models For Language Recognition
    BenZeghiba, Mohamed Faouzi
    Gauvain, Jean-Luc
    Lamel, Lori
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2718 - 2721
  • [3] Discriminative Training of n-gram Language Models for Speech Recognition via Linear Programming
    Magdin, Vladimir
    Jiang, Hui
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 305 - 310
  • [4] N-gram language models for offline handwritten text recognition
    Zimmermann, M
    Bunke, H
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 203 - 208
  • [5] LARGE MARGIN ESTIMATION OF N-GRAM LANGUAGE MODELS FOR SPEECH RECOGNITION VIA LINEAR PROGRAMMING
    Magdin, Vladimir
    Jiang, Hui
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5398 - 5401
  • [6] Language modeling by string pattern N-gram for Japanese speech recognition
    Ito, A
    Kohda, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 490 - 493
  • [7] TOPIC N-GRAM COUNT LANGUAGE MODEL ADAPTATION FOR SPEECH RECOGNITION
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 165 - 169
  • [8] N-gram language models for Polish language. Basic concepts and applications in automatic speech recognition systems
    Rapp, Bartosz
    2008 INTERNATIONAL MULTICONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (IMCSIT), VOLS 1 AND 2, 2008, : 295 - 298
  • [9] N-gram Language Models in JLASER Neural Network Speech Recognizer
    Konopik, Miloslav
    Habernal, Ivan
    Brychcin, Tomas
    2010 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS, 2010, : 167 - 170
  • [10] On compressing n-gram language models
    Hirsimaki, Teemu
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952