"(sic)Te vienes? Sure!" Joint Fine-tuning of Language Detection and Transcription Improves Automatic Recognition of Code-Switching Speech

被引:0
作者
Hillah, Leopold [1 ]
Dubiel, Mateusz [1 ]
Leiva, Luis A. [1 ]
机构
[1] Univ Luxembourg, Luxembourg, Luxembourg
来源
PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024 | 2024年
基金
欧盟地平线“2020”;
关键词
Code Switching; Multilingual Conversations; Language Identification; Automatic Speech Recognition; Whisper; Speech; MIXTURES;
D O I
10.1145/3640794.3665579
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human communication in multilingual communities often leads to code-switching, where individuals seamlessly alternate between two or more languages in their daily interactions. While this phenomenon has been increasingly prevalent thanks to linguistic globalization, it presents challenges for Automatic Speech Recognition (ASR) systems since they are designed with the assumption of transcribing a single language at a time. In this work, we propose a simple yet unexplored approach to tackle this challenge by fine-tuning the Whisper pre-trained model jointly on language identification (LID) and transcription tasks through the introduction of an auxiliary LID loss term. Our results show significant improvements in transcription errors, ranging between 14 and 36 percentage points of difference. Ultimately, our work opens a new direction for research on code-switching speech, offering an opportunity to enhance current capabilities of conversational agents.
引用
收藏
页数:7
相关论文
共 54 条
[1]  
Ahn Emily, 2020, Society for Computation in Linguistics., V3, P1
[2]  
[Anonymous], 1989, WorldEnglishes, DOI 10.1111/j.1467971X.1989.tb00673.x
[3]  
Aronin L., 2012, Multilingualism, V30
[4]  
Baevski A, 2020, ADV NEUR IN, V33
[5]  
Bhatti A., 2018, English Language Teaching, V11, P93, DOI [DOI 10.5539/ELT.V11N6P93, 10.5539/elt.v11n6p93]
[6]   Receptive vocabulary differences in monolingual and bilingual adults [J].
Bialystok, Ellen ;
Luk, Gigi .
BILINGUALISM-LANGUAGE AND COGNITION, 2012, 15 (02) :397-401
[7]  
Biderman D, 2024, Arxiv, DOI [arXiv:2405.09673, 10.48550/arXiv.2405.09673, DOI 10.48550/ARXIV.2405.09673]
[8]  
Bullock BE, 2018, COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, P68
[9]   Toward a Multilingual Conversational Agent: Challenges and Expectations of Code-mixing Multilingual Users [J].
Choi, Yunjae J. ;
Lee, Minha ;
Lee, Sangsu .
PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), 2023,
[10]   Bilingual by default: Voice Assistants and the role of code-switching in creating a bilingual user experience [J].
Cihan, Helin ;
Wu, Yunhan ;
Pena, Paola ;
Edwards, Justin ;
Cowan, Benjamin R. .
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON CONVERSATIONAL USER INTERFACES, CUI 2022, 2022,