Neural Error Corrective Language Models for Automatic Speech Recognition

被引:0
作者
Tanaka, Tomohiro [1 ]
Masumura, Ryo [1 ]
Masataki, Hirokazu [1 ]
Aono, Yushi [1 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
automatic speech recognition; language models; speech recognition error correction; conditional generative models;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present novel neural network based language models that can correct automatic speech recognition (ASR) errors by using speech recognizer output as a context. These models, called neural error corrective language models (NECLMs), utilizes ASR hypotheses of a target utterance as a context for estimating the generative probability of words. NECLMs are expressed as conditional generative models composed of an encoder network and a decoder network. In the models, the encoder network constructs context vectors from N-best lists and ASR confidence scores generated in a speech recognizer. The decoder network rescores recognition hypotheses by computing a generative probability of words using the context vectors so as to correct ASR errors. We evaluate the proposed models in Japanese lecture ASR tasks. Experimental results show that NECLM achieve better ASR performance than a state-of-the-art ASR system that incorporate a convolutional neural network acoustic model and a long short-term memory recurrent neural network language model.
引用
收藏
页码:401 / 405
页数:5
相关论文
共 29 条
  • [1] [Anonymous], 2014, Advances in neural information processing systems
  • [2] [Anonymous], 2017, P EACL EACL 17
  • [3] [Anonymous], 2000, P ANN C INT SPEECH C
  • [4] [Anonymous], 2004, ACL
  • [5] [Anonymous], ACOUST SPEECH SIG PR
  • [6] [Anonymous], P ANN C INT SPEECH C
  • [7] [Anonymous], P ANN M ASS COMP LIN
  • [8] [Anonymous], 2014, INT C LEARN REPR
  • [9] [Anonymous], 2014, ARXIV14125567V2CSCL
  • [10] Bacchiani M, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P224