IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION

被引：6

作者：

Li, Qiujia ^{[1
]}

Zhang, Yu ^{[2
]}

Qiu, David ^{[2
]}

He, Yanzhang ^{[2
]}

Cao, Liangliang ^{[2
]}

Woodland, Philip C. ^{[1
]}

机构：

[1] Univ Cambridge, Cambridge, England

[2] Google LLC, Mountain View, CA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

confidence scores; end-to-end; automatic speech recognition; out-of-domain;

D O I：

10.1109/ICASSP43922.2022.9746979

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection.

引用

页码：6537 / 6541

页数：5

共 50 条

[31] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
Gourav, Aditya
Liu, Linda
Gandhe, Ankur
Gu, Yile
Lan, Guitang
Huang, Xiangyang
Kalmane, Shashank
Tiwari, Gautam
Filimonov, Denis
Rastrow, Ariya
Stolcke, Andreas
Bulyko, Ivan
Alexa, Amazon
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
[32] Inverted Alignments for End-to-End Automatic Speech Recognition
Doetsch, Patrick
Hannemann, Mirko
Schluter, Ralf
Ney, Hermann
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273
[33] Integrating Lattice-Free MMI Into End-to-End Speech Recognition
Tian, Jinchuan
Yu, Jianwei
Weng, Chao
Zou, Yuexian
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 (25-38) : 25 - 38
[34] End-to-end speech recognition using lattice-free MMI
Hadian, Hossein
Sameti, Hossein
Povey, Daniel
Khudanpur, Sanjeev
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 12 - 16
[35] Exploring end-to-end framework towards Khasi speech recognition system
Syiem, Bronson
Singh, L. Joyprakash
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 419 - 424
[36] End-to-End Myanmar Speech Recognition with Human-Machine Cooperation
Wang, Faliang
Yang, Yiling
Yang, Jian
2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 156 - 161
[37] Development of CRF and CTC Based End-To-End Kazakh Speech Recognition System
Oralbekova, Dina
Mamyrbayev, Orken
Othman, Mohamed
Alimhan, Keylan
Zhumazhanov, Bagashar
Nuranbayeva, Bulbul
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 519 - 531
[38] Attention-Based End-to-End Named Entity Recognition from Speech
Porjazovski, Dejan
Leinonen, Juho
Kurimo, Mikko
TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 469 - 480
[39] Improved training strategies for end-to-end speech recognition in digital voice assistants
Tulsiani, Hitesh
Sapru, Ashtosh
Arsikere, Harish
Punjabi, Surabhi
Garimella, Sri
INTERSPEECH 2020, 2020, : 2792 - 2796
[40] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
Liu, Alexander H.
Lee, Hung-yi
Lee, Lin-shan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180

← 1 2 3 4 5 →