Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition

被引：10

作者：

Tueske, Zoltan ^{[1
,2
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52056 Aachen, Germany

[2] IBM Res, Thomas J Watson Res Ctr, POB 704, Yorktown Hts, NY 10598 USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

欧洲研究理事会;

关键词：

speech recognition; language-modeling; LSTM; n-gram; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2018-2476

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recurrent neural networks (NN) with long short-term memory (LSTM) are the current state of the art to model long term dependencies. However, recent studies indicate that NN language models (LM) need only limited length of history to achieve excellent performance. In this paper, we extend the previous investigation on LSTM network based n-gram modeling to the domain of automatic speech recognition (ASR). First, applying recent optimization techniques and up to 6-layer LSTM networks, we improve LM perplexities by nearly 50% relative compared to classic count models on three different domains. Then, we demonstrate by experimental results that perplexities improve significantly only up to 40-grams when limiting the LM history. Nevertheless, the ASR performance saturates already around 20-grams despite across sentence modeling. Analysis indicates that the performance gain of LSTM NNLM over count models results only partially from the longer context and cross sentence modeling capabilities. Using equal context, we show that deep 4-gram LSTM can significantly outperform large interpolated count models by performing the backing off and smoothing significantly better. This observation also underlines the decreasing importance to combine state-of-the-art deep NNLM with count based model.

引用

页码：3358 / 3362

页数：5

共 50 条

[41] Character n-Gram Embeddings to Improve RNN Language Models
Takase, Sho
Suzuki, Jun
Nagata, Masaaki
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5074 - 5082
[42] On the N-gram Approximation of Pre-trained Language Models
Krishnan, Aravind
Alabi, Jesujoba O.
Klakow, Dietrich
INTERSPEECH 2023, 2023, : 371 - 375
[43] Learning N-gram Language Models from Uncertain Data
Kuznetsov, Vitaly
Liao, Hank
Mohri, Mehryar
Riley, Michael
Roark, Brian
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2323 - 2327
[44] Language Identification of Short Text Segments with N-gram Models
Vatanen, Tommi
Vayrynen, Jaakko J.
Virpioja, Sami
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3423 - 3430
[45] Variable-length category n-gram language models
Niesler, TR
Woodland, PC
COMPUTER SPEECH AND LANGUAGE, 1999, 13 (01): : 99 - 124
[46] Rich Morphology Based N-gram Language Models for Arabic
Emami, Ahmad
Zitouni, Imed
Mangu, Lidia
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 829 - 832
[47] Modeling actions of PubMed users with n-gram language models
Lin, Jimmy
Wilbur, W. John
INFORMATION RETRIEVAL, 2009, 12 (04): : 487 - 503
[48] Modeling actions of PubMed users with n-gram language models
Jimmy Lin
W. John Wilbur
Information Retrieval, 2009, 12 : 487 - 503
[49] INTEGRATION OF n-GRAM LANGUAGE MODELS IN MULTIPLE CLASSIFIER SYSTEMS FOR OFFLINE HANDWRITTEN TEXT LINE RECOGNITION
Bertolami, Roman
Bunke, Horst
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2008, 22 (07) : 1301 - 1321
[50] Efficient Combination of N-gram Language Models and Recognition Grammars in Real-Time LVCSR Decoder
Prazak, Ales
Ircing, Pavel
Svec, Jan
Psutka, Josef
ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 587 - +

← 1 2 3 4 5 →