Factual Consistency Oriented Speech Recognition

被引:1
|
作者
Kanda, Naoyuki [1 ]
Yoshioka, Takuya [1 ]
Liu, Yang [1 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
来源
关键词
speech recognition; speech summarization; hallucination errors; ASR;
D O I
10.21437/Interspeech.2023-485
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and groundtruth transcriptions, where the factual consistency score is computed by a separately trained estimator. Experimental results using the AMI meeting corpus and the VoxPopuli corpus show that the ASR model trained with the proposed framework generates ASR hypotheses that have significantly higher consistency scores with ground-truth transcriptions while maintaining the word error rates close to those of cross entropy-trained ASR models. Furthermore, it is shown that training the ASR models with the proposed framework improves the speech summarization quality as measured by the factual consistency of meeting conversation summaries generated by a large language model.
引用
收藏
页码:236 / 240
页数:5
相关论文
共 50 条
  • [1] Towards speech recognition oriented dereverberation
    Jinachitra, P
    Prieto, RE
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 437 - 440
  • [2] The developmental turnpoint of orthographic consistency effects in speech recognition
    Ventura, Paulo
    Kolinsky, Regine
    Pattamadilok, Chotiga
    Morais, Jose
    JOURNAL OF EXPERIMENTAL CHILD PSYCHOLOGY, 2008, 100 (02) : 135 - 145
  • [3] Enhancing Factual Consistency of Abstractive Summarization
    Zhu, Chenguang
    Hinthorn, William
    Xu, Ruochen
    Zeng, Qingkai
    Zeng, Michael
    Huang, Xuedong
    Jiang, Meng
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 718 - 733
  • [4] Multilingual Summarization with Factual Consistency Evaluation
    Aharoni, Roee
    Narayan, Shashi
    Maynez, Joshua
    Herzig, Jonathan
    Clark, Elizabeth
    Lapata, Mirella
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3562 - 3591
  • [5] Recognition rate prediction for dysarthric speech disorder via speech consistency score
    Kayasith, Prakasith
    Theeramunkong, Thanaruk
    Thubthong, Nuttakorn
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 885 - 889
  • [6] Speech-oriented negative emotion recognition
    He, Liang
    Bo, Yuming
    Zhao, Gaopeng
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3553 - 3558
  • [7] A STUDY OF THE SYLLABLE ORIENTED RECOGNITION OF CONTINUOUS SPEECH
    TANAKA, A
    TOGAWA, F
    UEDA, T
    HAKARIDANI, M
    IWAHASHI, H
    NISHIOKA, Y
    KOBAYASHI, T
    KINPARA, S
    YAMASHITA, K
    SPEECH COMMUNICATION, 1983, 2 (2-3) : 207 - 210
  • [8] Production-oriented models for speech recognition
    McDermott, E
    Nakamura, A
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1006 - 1014
  • [9] Spanish Speech Recognition Oriented to a Wheelchair Control
    Gil, L. J.
    Castillo, L. F.
    Florez, R. D.
    UIS INGENIERIAS, 2016, 15 (02): : 35 - 48
  • [10] Factual Consistency of Multilingual Pretrained Language Models
    Fierro, Constanza
    Sogaard, Anders
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3046 - 3052