Correcting, Rescoring and Matching: An N-best List Selection Framework for Speech Recognition

被引:0
|
作者
Kuo, Chin-Hung [1 ]
Chen, Kuan-Yu [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan
来源
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, automatic speech recognition (ASR) has been widely used in various scenarios, and it is usually the first step in many applications. Therefore, more and more studies concentrate on enhancing the recognition results. Among them, N-best reranking and error correction models are two active research subjects. Various models have been proposed and demonstrated their success. However, as the N-best reranking models aim to select the best hypothesis from a set of candidates, their performance upper bound is limited by the given set of hypotheses. The error correction models detect and correct recognition errors so as to provide better results, but they usually perform the process on the highest-scored hypothesis only. Therefore, the information embedded in other candidates is ignored. Besides, we note that almost all of the N-best reranking and error correction models consider the acoustic information implicitly, indirectly, or even omitted. In order to mitigate these flaws, we propose an N-best list selection framework, which consists of a text correction module, a text rescoring module, and a text-speech matching module, for speech recognition. Based on the proposed framework, a set of corrected hypotheses can be deduced, and then the text rescoring module is introduced to accurately rescore them. In addition, the text-speech matching module is employed to calculate the alignment score between each hypothesis and its own speech. The proposed framework is evaluated on the AISHELL-1 dataset, and the experimental results reveal that the proposed framework can deliver over 30% character error reduction rates compared to the baseline systems.
引用
收藏
页码:729 / 734
页数:6
相关论文
共 50 条
  • [1] Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [2] BERT-based Semantic Model for Rescoring N-best Speech Recognition List
    Fohr, Dominique
    Illina, Irina
    INTERSPEECH 2021, 2021, : 1867 - 1871
  • [3] DISCRIMINATIVE RECOGNITION RATE ESTIMATION FOR N-BEST LIST AND ITS APPLICATION TO N-BEST RESCORING
    Ogawa, Atsunori
    Hori, Takaaki
    Nakamura, Atsushi
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6832 - 6836
  • [4] Multimodal N-best List Rescoring with Weakly Supervised Pre-training in Hybrid Speech Recognition
    Song, Yuanfeng
    Huang, Xiaoling
    Zhao, Xuefang
    Jiang, Di
    Wong, Raymond Chi-Wing
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1336 - 1341
  • [5] N-best list rescoring using syntactic trigrams
    Salgado-Garza, LR
    Stern, RM
    Nolazco, JA
    MICAI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2004, 2972 : 79 - 88
  • [6] Improved speech recognition using acoustic and lexical correlates of pitch accent in a N-best rescoring framework
    Ananthakrishnan, Sankaranarayanan
    Narayanan, Shrikanth
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 873 - +
  • [7] Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition
    Liu, Chang
    Zhang, Pengyuan
    Li, Ta
    Yan, Yonghong
    APPLIED SCIENCES-BASEL, 2019, 9 (23):
  • [8] Improved Deep Duel Model for Rescoring N-best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders
    Ogawa, Atsunori
    Delcroix, Marc
    Karita, Shigeki
    Nakatani, Tomohiro
    INTERSPEECH 2019, 2019, : 3900 - 3904
  • [9] N-best rescoring for speech recognition using penalized logistic regression machines with garbage class
    Birkenes, Oystein
    Matsui, Tomoko
    Tanabe, Kunio
    Myrvoll, Tor Andre
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 449 - +
  • [10] RESCORING N-BEST SPEECH RECOGNITION LIST BASED ON ONE-ON-ONE HYPOTHESIS COMPARISON USING ENCODER-CLASSIFIER MODEL
    Ogawa, Atsunori
    Delcroix, Marc
    Karita, Shigeki
    Nakatani, Tomohiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6099 - 6103