N-gram Approximation of LSTM Recurrent Language Models for Single-pass Recognition of Hungarian Call Center Conversations

被引:0
|
作者
Tarjan, Balazs [1 ,2 ]
Szaszak, Gyorgy [1 ]
Fegyo, Tibor [1 ,2 ]
Mihajlik, Peter [1 ,3 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[2] SpeechTex Ltd, Budapest, Hungary
[3] THINKTech Res Ctr, Budapest, Hungary
关键词
speech recognition; neural language model; RNNLM; LSTM; conversational speech; call center conversations; morphologically rich language;
D O I
10.1109/coginfocom47531.2019.9089959
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Modeling the less constrained grammar and word order of conversational speech poses a great challenge to conventional back-off n-gram language models (BNLMs). Recurrent Neural Network Language Models (RNNLMs) can provide much better predictions, however, in real-time Automatic Speech Recognition (ASR) systems (e.g. speech dictation) the process delay due to two-pass decoding can not be tolerated. In this paper, we investigate n-gram-based language modeling techniques that can be applied in a single-pass ASR system to approximate the performance of a RNNLM. Perplexity and word error rate (WER) of BNLMs, BNLM approximation of RNNLMs (RNN-BNLM) and RNN n-grams are compared on our Hungarian ASR task. Rich morphology of agglutinative languages (like Hungarian) is often handled by using subword language models, hence we evaluated subword BNLMs, RNN-BNLM and RNN n-grams, as well. It was found that a subword RNN-BNLM can approach the performance of a RNN 4-gram model, and recover roughly 40% of the RNNLM perplexity reduction. All in all, we managed to improve WER of our call center speech transcription system 8% relative without affecting its real-time operation.
引用
收藏
页码:131 / 136
页数:6
相关论文
共 18 条
  • [1] Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition
    Tueske, Zoltan
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3358 - 3362
  • [2] Improved N-gram Phonotactic Models For Language Recognition
    BenZeghiba, Mohamed Faouzi
    Gauvain, Jean-Luc
    Lamel, Lori
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2718 - 2721
  • [3] On the N-gram Approximation of Pre-trained Language Models
    Krishnan, Aravind
    Alabi, Jesujoba O.
    Klakow, Dietrich
    INTERSPEECH 2023, 2023, : 371 - 375
  • [4] N-gram language models for offline handwritten text recognition
    Zimmermann, M
    Bunke, H
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 203 - 208
  • [5] Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models
    Yang, Yiben
    Wang, Ji-Ping
    Downey, Doug
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3268 - 3273
  • [6] Discriminative Training of n-gram Language Models for Speech Recognition via Linear Programming
    Magdin, Vladimir
    Jiang, Hui
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 305 - 310
  • [7] PREDICTION OF LSTM-RNN FULL CONTEXT STATES AS A SUBTASK FOR N-GRAM FEEDFORWARD LANGUAGE MODELS
    Irie, Kazuki
    Lei, Zhihong
    Schlueter, Ralf
    Ney, Hermann
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6104 - 6108
  • [8] LARGE MARGIN ESTIMATION OF N-GRAM LANGUAGE MODELS FOR SPEECH RECOGNITION VIA LINEAR PROGRAMMING
    Magdin, Vladimir
    Jiang, Hui
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5398 - 5401
  • [9] N-gram language models for Polish language. Basic concepts and applications in automatic speech recognition systems
    Rapp, Bartosz
    2008 INTERNATIONAL MULTICONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (IMCSIT), VOLS 1 AND 2, 2008, : 295 - 298
  • [10] INTEGRATION OF n-GRAM LANGUAGE MODELS IN MULTIPLE CLASSIFIER SYSTEMS FOR OFFLINE HANDWRITTEN TEXT LINE RECOGNITION
    Bertolami, Roman
    Bunke, Horst
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2008, 22 (07) : 1301 - 1321