Neural Error Corrective Language Models for Automatic Speech Recognition

被引:0
|
作者
Tanaka, Tomohiro [1 ]
Masumura, Ryo [1 ]
Masataki, Hirokazu [1 ]
Aono, Yushi [1 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan
关键词
automatic speech recognition; language models; speech recognition error correction; conditional generative models;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present novel neural network based language models that can correct automatic speech recognition (ASR) errors by using speech recognizer output as a context. These models, called neural error corrective language models (NECLMs), utilizes ASR hypotheses of a target utterance as a context for estimating the generative probability of words. NECLMs are expressed as conditional generative models composed of an encoder network and a decoder network. In the models, the encoder network constructs context vectors from N-best lists and ASR confidence scores generated in a speech recognizer. The decoder network rescores recognition hypotheses by computing a generative probability of words using the context vectors so as to correct ASR errors. We evaluate the proposed models in Japanese lecture ASR tasks. Experimental results show that NECLM achieve better ASR performance than a state-of-the-art ASR system that incorporate a convolutional neural network acoustic model and a long short-term memory recurrent neural network language model.
引用
收藏
页码:401 / 405
页数:5
相关论文
共 50 条
  • [21] Neural Speech-to-Text Language Models for Rescoring Hypotheses of DNN-HMM Hybrid Automatic Speech Recognition Systems
    Tanaka, Tomohiro
    Masumura, Ryo
    Moriya, Takafumi
    Aono, Yushi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 196 - 200
  • [22] Improving Automatic Speech Recognition with Dialect-Specific Language Models
    Gothi, Raj
    Rao, Preeti
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 57 - 67
  • [23] Cross-language adaptation of acoustic models in automatic speech recognition
    Univ of Pretoria, Pretoria, South Africa
    IEEE AFRICON Conf, (181-184):
  • [24] LEARNING RECURRENT NEURAL NETWORK LANGUAGE MODELS WITH CONTEXT-SENSITIVE LABEL SMOOTHING FOR AUTOMATIC SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    Han, Mei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6159 - 6163
  • [25] Employing automatic speech recognition for quantitative oral corrective feedback in Japanese second or foreign language education
    Keio University, 5322 Endo, Fujisawa, Kanagawa
    252-0816, Japan
    不详
    502285, India
    ACM Int. Conf. Proc. Ser., (52-58):
  • [26] Comparison Of Language Models Trained On Written Texts And Speech Transcripts In The Context Of Automatic Speech Recognition
    Dziadzio, Sebastian
    Nabozny, Aleksandra
    Smywinski-Pohl, Aleksander
    Ziolko, Bartosz
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 193 - 197
  • [27] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
    Chen, X.
    Ragni, A.
    Liu, X.
    Gales, M. J. F.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
  • [28] Cross-sentence Neural Language Models for Conversational Speech Recognition
    Chiu, Shih-Hsuan
    Lo, Tien-Hong
    Chen, Berlin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [29] Structured Output Layer Neural Network Language Models for Speech Recognition
    Le, Hai-Son
    Oparin, Ilya
    Allauzen, Alexandre
    Gauvain, Jean-Luc
    Yvon, Francois
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (01): : 195 - 204
  • [30] Empirical study of neural network language models for Arabic speech recognition
    Emami, Ahmad
    Mangu, Lidia
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 147 - 152