Comparison of Various Neural Network Language Models in Speech Recognition

被引：1

作者：

Zuo, Lingyun ^{[1
]}

Liu, Jian ^{[1
,2
]}

Wan, Xin ^{[3
]}

机构：

[1] IACAS, Key Lab Speech Acoust & Content, Beijing, Peoples R China

[2] Chinese Acad Sci, XTIPC, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China

[3] Natl Comp Network Emergency Response Tech Team, Coordinat Ctr, Beijing, Peoples R China

来源：

2016 3RD INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE) | 2016年

关键词：

neural network language model; LSTM; speech recognition; n-best lists re-score;

D O I：

10.1109/ICISCE.2016.195

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, research on language modeling for speech recognition has increasingly focused on the application of neural networks. However, the performance of neural network language models strongly depends on their architectural structure. Three competing concepts have been developed: Firstly, feed forward neural networks representing an n-gram approach; Secondly, recurrent neural networks that may learn context dependencies spanning more than a fixed number of predecessor words; Thirdly, the long short-term memory (LSTM) neural networks can fully exploits the correlation on a telephone conversation corpus. In this paper, we compare count models to feed forward, recurrent, and LSTM neural network in conversational telephone speech recognition tasks. Furthermore, we put forward a language model estimation method introduced the information of history sentences. We evaluate the models in terms of perplexity and word error rate, experimentally validating the strong correlation of the two quantities, which we find to hold regardless of the underlying type of the language model. The experimental results show that the performance of LSTM neural network language model is optimal in n-best lists rescore. Compared to the first pass decoding, the relative decline in average word error rate is 4.3% when using ten candidate results to re-score in conversational telephone speech recognition tasks.

引用

页码：894 / 898

页数：5

共 50 条

[41] Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies
Cui, Xiaodong
Zhang, Wei
Finkler, Ulrich
Saon, George
Picheny, Michael
Kung, David
IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 39 - 49
[42] Bag-of-Words Input for Long History Representation in Neural Network-based Language Models for Speech Recognition
Irie, Kazuki
Schlueter, Ralf
Ney, Hermann
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2371 - 2375
[43] Evaluation of Neural Network Language Models In Handwritten Chinese Text Recognition
Wu, Yi-Chao
Yin, Fei
Liu, Cheng-Lin
2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 166 - 170
[44] Neural network language models for off-line handwriting recognition
Zamora-Martinez, F.
Frinken, V.
Espana-Boquera, S.
Castro-Bleda, M. J.
Fischer, A.
Bunke, H.
PATTERN RECOGNITION, 2014, 47 (04) : 1642 - 1652
[45] N-gram Language Models in JLASER Neural Network Speech Recognizer
Konopik, Miloslav
Habernal, Ivan
Brychcin, Tomas
2010 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS, 2010, : 167 - 170
[46] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
Pondel-Sycz, Karolina
Pietrzak, Agnieszka Paula
Szymla, Julia
INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
[47] A COMPARISON BETWEEN DEEP NEURAL NETS AND KERNEL ACOUSTIC MODELS FOR SPEECH RECOGNITION
Lu, Zhiyun
Guo, Dong
Garakani, Alireza Bagheri
Liu, Kuan
May, Avner
Bellet, Aurelien
Fan, Linxi
Collins, Michael
Kingsbury, Brian
Picheny, Michael
Sha, Fei
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5070 - 5074
[48] A comparison of different spectral analysis models for speech recognition using neural networks
Zebulum, RS
Vellasco, M
Perelmuter, G
Pacheco, MA
PROCEEDINGS OF THE 39TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 1996, : 1428 - 1431
[49] Gaussian mixture language models for speech recognition
Afify, Mohamed
Siohan, Olivier
Sarikaya, Ruhi
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 29 - +
[50] Improving language models for radiology speech recognition
Paulett, John M.
Langlotz, Curtis P.
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58

← 1 2 3 4 5 →