IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION

被引:6
作者
Li, Qiujia [1 ]
Zhang, Yu [2 ]
Qiu, David [2 ]
He, Yanzhang [2 ]
Cao, Liangliang [2 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Google LLC, Mountain View, CA USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
confidence scores; end-to-end; automatic speech recognition; out-of-domain;
D O I
10.1109/ICASSP43922.2022.9746979
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection.
引用
收藏
页码:6537 / 6541
页数:5
相关论文
共 50 条
  • [31] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
    Gourav, Aditya
    Liu, Linda
    Gandhe, Ankur
    Gu, Yile
    Lan, Guitang
    Huang, Xiangyang
    Kalmane, Shashank
    Tiwari, Gautam
    Filimonov, Denis
    Rastrow, Ariya
    Stolcke, Andreas
    Bulyko, Ivan
    Alexa, Amazon
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
  • [32] Inverted Alignments for End-to-End Automatic Speech Recognition
    Doetsch, Patrick
    Hannemann, Mirko
    Schluter, Ralf
    Ney, Hermann
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273
  • [33] Integrating Lattice-Free MMI Into End-to-End Speech Recognition
    Tian, Jinchuan
    Yu, Jianwei
    Weng, Chao
    Zou, Yuexian
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 (25-38) : 25 - 38
  • [34] End-to-end speech recognition using lattice-free MMI
    Hadian, Hossein
    Sameti, Hossein
    Povey, Daniel
    Khudanpur, Sanjeev
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 12 - 16
  • [35] Exploring end-to-end framework towards Khasi speech recognition system
    Syiem, Bronson
    Singh, L. Joyprakash
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 419 - 424
  • [36] End-to-End Myanmar Speech Recognition with Human-Machine Cooperation
    Wang, Faliang
    Yang, Yiling
    Yang, Jian
    2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 156 - 161
  • [37] Development of CRF and CTC Based End-To-End Kazakh Speech Recognition System
    Oralbekova, Dina
    Mamyrbayev, Orken
    Othman, Mohamed
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Nuranbayeva, Bulbul
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 519 - 531
  • [38] Attention-Based End-to-End Named Entity Recognition from Speech
    Porjazovski, Dejan
    Leinonen, Juho
    Kurimo, Mikko
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 469 - 480
  • [39] Improved training strategies for end-to-end speech recognition in digital voice assistants
    Tulsiani, Hitesh
    Sapru, Ashtosh
    Arsikere, Harish
    Punjabi, Surabhi
    Garimella, Sri
    INTERSPEECH 2020, 2020, : 2792 - 2796
  • [40] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180