TASK ORIENTED DIALOGUE AS A CATALYST FOR SELF-SUPERVISED AUTOMATIC SPEECH RECOGNITION

被引:0
作者
Chan, David M. [1 ,2 ]
Ghosht, Shalini [2 ]
Tulsian, Hitesh [2 ]
Rastrowt, Ariya [2 ]
Hofftneistert, Bjtim [2 ]
al, Chang Et [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Amazon Alexa AI, Palo Alto, CA 94303 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024) | 2024年
关键词
Task Oriented Dialogue; Automatic Speech Recognition; Self-Supervised Learning;
D O I
10.1109/ICASSP48485.2024.10447164
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While word error rates of automatic speech recognition (ASR) systems have consistently fallen, natural language understanding (NLU) applications built on top of ASR systems still attribute significant numbers of failures to low-quality speech recognition results. Existing assistant systems collect large numbers of these unsuccessful interactions, but these systems usually fail to learn from these interactions, even in an offline fashion. In this work, we introduce CLC: Contrastive Learning for Conversations, a family of methods for contrastive fine-tuning of models in a self-supervised fashion, making use of easily detectable artifacts in unsuccessful conversations with assistants. We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19.2%. These gains transfer to real-world systems as well, where we show that CLC can help to improve performance by up to 6.7% over baselines.(1)
引用
收藏
页码:11806 / 11810
页数:5
相关论文
共 36 条
[1]  
Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218
[2]  
Baevski Alexei, 2020, wav2vec 2.0: A framework for self-supervised learning of speech representations
[3]  
Budzianowski P, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P5016
[4]  
Casanova E., 2022, ICML
[5]  
Chan D. M., 2023, ICASSP
[6]  
Chan D. M., 2023, ICASSP
[7]  
Chan D. M., 2022, ICASSP
[8]   Content-Context Factorized Representations for Automated Speech Recognition [J].
Chan, David M. ;
Ghosh, Shalini .
INTERSPEECH 2022, 2022, :61-65
[9]  
Chang F.-J., 2021, ASRU
[10]  
Chang S.-Y., 2023, ICASSP