IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION

被引：6

作者：

Li, Qiujia ^{[1
]}

Zhang, Yu ^{[2
]}

Qiu, David ^{[2
]}

He, Yanzhang ^{[2
]}

Cao, Liangliang ^{[2
]}

Woodland, Philip C. ^{[1
]}

机构：

[1] Univ Cambridge, Cambridge, England

[2] Google LLC, Mountain View, CA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

confidence scores; end-to-end; automatic speech recognition; out-of-domain;

D O I：

10.1109/ICASSP43922.2022.9746979

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection.

引用

页码：6537 / 6541

页数：5

共 50 条

[21] End-to-end speech recognition modeling from de-identified data
Flechl, Martin
Yin, Shou-Chun
Park, Junho
Skala, Peter
INTERSPEECH 2022, 2022, : 1382 - 1386
[22] Insights on Neural Representations for End-to-End Speech Recognition
Ollerenshaw, Anna
Jalal, Asif
Hain, Thomas
INTERSPEECH 2021, 2021, : 4079 - 4083
[23] Combination of end-to-end and hybrid models for speech recognition
Wong, Jeremy H. M.
Gaur, Yashesh
Zhao, Rui
Lu, Liang
Sun, Eric
Li, Jinyu
Gong, Yifan
INTERSPEECH 2020, 2020, : 1783 - 1787
[24] Towards end-to-end speech recognition with transfer learning
Qin, Chu-Xiong
Qu, Dan
Zhang, Lian-Hai
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[25] Exploring end-to-end framework towards Khasi speech recognition system
Bronson Syiem
L. Joyprakash Singh
International Journal of Speech Technology, 2021, 24 : 419 - 424
[26] End-to-end named entity recognition for Vietnamese speech
Nguyen, Thu-Hien
Nguyen, Thai-Binh
Do, Quoc-Truong
Nguyen, Tuan-Linh
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
[27] Towards end-to-end speech recognition with transfer learning
Chu-Xiong Qin
Dan Qu
Lian-Hai Zhang
EURASIP Journal on Audio, Speech, and Music Processing, 2018
[28] Integrating Lattice-Free MMI Into End-to-End Speech Recognition
Tian, Jinchuan
Yu, Jianwei
Weng, Chao
Zou, Yuexian
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 (25-38) : 25 - 38
[29] End-to-end speech recognition using lattice-free MMI
Hadian, Hossein
Sameti, Hossein
Povey, Daniel
Khudanpur, Sanjeev
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 12 - 16
[30] Inverted Alignments for End-to-End Automatic Speech Recognition
Doetsch, Patrick
Hannemann, Mirko
Schluter, Ralf
Ney, Hermann
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273

← 1 2 3 4 5 →