Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition

被引:3
|
作者
Liu, Chang [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
Li, Ta [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Xinjiang Key Lab Minor Speech & Language Informat, Urumqi 830011, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 23期
基金
中国国家自然科学基金;
关键词
automatic speech recognition (ASR); semantic model; topic model; continuous word representation;
D O I
10.3390/app9235053
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this work, we aim to re-rank the n-best hypotheses of an automatic speech recognition system by punishing the sentences which have words that are semantically different from the context and rewarding the sentences where all words are in semantical harmony. To achieve this, we proposed a topic similarity score that measures the difference between topic distribution of words and the corresponding sentence. We also proposed another word-discourse score that quantifies the likeliness for a word to appear in the sentence by the inner production of word vector and discourse vector. Besides, we used the latent semantic marginal and a variation of log bi-linear model to get the sentence coordination score. In addition we introduce a fallibility weight, which assists the computation of the sentence semantically coordination score by instructing the model to pay more attention to the words that appear less in the hypotheses list and we show how to use the scores and the fallibility weight in hypotheses rescoring. None of the rescoring methods need extra parameters other than the semantic models. Experiments conducted on the Wall Street Journal corpus show that, by using the proposed word-discourse score on 50-dimension word embedding, we can achieve 0.29% and 0.51% absolute word error rate (WER) reductions on the two testsets.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] BERT-based Semantic Model for Rescoring N-best Speech Recognition List
    Fohr, Dominique
    Illina, Irina
    INTERSPEECH 2021, 2021, : 1867 - 1871
  • [2] Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [3] Rescoring of N-Best Hypotheses Using Top-Down Selective Attention for Automatic Speech Recognition
    Kim, Ho-Gyeong
    Lee, Hwaran
    Kim, Geonmin
    Oh, Sang-Hoon
    Lee, Soo-Young
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (02) : 199 - 203
  • [4] Correcting, Rescoring and Matching: An N-best List Selection Framework for Speech Recognition
    Kuo, Chin-Hung
    Chen, Kuan-Yu
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 729 - 734
  • [5] DISCRIMINATIVE RECOGNITION RATE ESTIMATION FOR N-BEST LIST AND ITS APPLICATION TO N-BEST RESCORING
    Ogawa, Atsunori
    Hori, Takaaki
    Nakamura, Atsushi
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6832 - 6836
  • [6] N-best rescoring for speech recognition using penalized logistic regression machines with garbage class
    Birkenes, Oystein
    Matsui, Tomoko
    Tanabe, Kunio
    Myrvoll, Tor Andre
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 449 - +
  • [7] Multimodal N-best List Rescoring with Weakly Supervised Pre-training in Hybrid Speech Recognition
    Song, Yuanfeng
    Huang, Xiaoling
    Zhao, Xuefang
    Jiang, Di
    Wong, Raymond Chi-Wing
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1336 - 1341
  • [8] Improved speech recognition using acoustic and lexical correlates of pitch accent in a N-best rescoring framework
    Ananthakrishnan, Sankaranarayanan
    Narayanan, Shrikanth
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 873 - +
  • [9] Improvement in N-best search for continuous speech recognition
    Illina, I
    Gong, YF
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2147 - 2150
  • [10] Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition
    Shi, Yongzhe
    Zhang, Wei-Qiang
    Cai, Meng
    Liu, Jia
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,