Self-supervised Learning and Masked Language Model for Code-switching Automatic Speech Recognition

被引:0
作者
Chen, Po-Kai [1 ]
Fu, Li-Yeh [2 ]
Chen, Cheng-Kai [1 ]
Lin, Yi-Xing [1 ]
Chen, Chih-Ping [1 ]
Huang, Chien-Lin [3 ]
Wang, Jia-Ching [1 ]
机构
[1] Natl Cent Univ, Dept CSIE, Taoyuan, Taiwan
[2] Realtek Semicond Corp, Hsinchu, Taiwan
[3] Natl Cheng Kung Univ, Dept CSIE, Tainan, Taiwan
来源
2024 IEEE TENTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, ICCE 2024 | 2024年
关键词
code-switching; speech recognition; self-supervised learning; masked language modeling;
D O I
10.1109/ICCE62051.2024.10634607
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Code-switching (CS) is a common linguistic phenomenon that poses significant challenges for automatic speech recognition systems due to the lack of corpus. In this paper, we propose a novel approach to address this challenge by leveraging self-supervised learning (SSL) and the masked language model (MLM) in speech recognition. Specifically, we use the wav2vec 2.0 pre-trained model to reduce the dependency on CS labeled data, and the MLM to rerank sentences generated using beam search decoding. Our proposed method is evaluated on the SEAME dataset, and experimental results show that it outperforms state-of-the-art CS speech recognition approaches by 15.6% and 19.9% in terms of token error rates (TER). Moreover, the proposed method is generalizable and can be extended to other CS languages. These results demonstrate the effectiveness of our approach and its potential for future research in the field of CS speech recognition.
引用
收藏
页码:387 / 391
页数:5
相关论文
共 50 条
[31]   Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech [J].
Yilmaz, Emre ;
van den Heuvel, Henk ;
van Leeuwen, David .
SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 :159-166
[32]   CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning [J].
Meng, Chutong ;
Ao, Junyi ;
Ko, Tom ;
Wang, Mingxuan ;
Li, Haizhou .
INTERSPEECH 2023, 2023, :2978-2982
[33]   Self-supervised learning using unlabeled speech with multiple types of speech disorder for disordered speech recognition [J].
Takashima, Ryoichi ;
Otani, Takeru ;
Aihara, Ryo ;
Takiguchi, Tetsuya ;
Taguchi, Shinya .
PROCEEDINGS OF THE 26TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2024, 2024,
[34]   MECOS: A bilingual Manipuri-English spontaneous code-switching speech corpus for automatic speech recognition [J].
Singh, Naorem Karline ;
Chanu, Yambem Jina ;
Pangsatabam, Hoomexsun .
COMPUTER SPEECH AND LANGUAGE, 2024, 87
[35]   Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification [J].
Huang, Zheying ;
Wang, Pei ;
Wang, Jian ;
Miao, Haoran ;
Xu, Ji ;
Zhang, Pengyuan .
APPLIED SCIENCES-BASEL, 2021, 11 (19)
[36]   Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition [J].
Zhou, Xinyuan ;
Yilmaz, Emre ;
Long, Yanhua ;
Li, Yijie ;
Li, Haizhou .
INTERSPEECH 2020, 2020, :1042-1046
[37]   Research on Mongolian Speech Recognition Based on the Self-supervised Model [J].
Su, Hongyi ;
Xue, Yu .
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, :199-203
[38]   Pronunciation augmentation for Mandarin-English code-switching speech recognition [J].
Long, Yanhua ;
Wei, Shuang ;
Lian, Jie ;
Li, Yijie .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[39]   Pronunciation augmentation for Mandarin-English code-switching speech recognition [J].
Yanhua Long ;
Shuang Wei ;
Jie Lian ;
Yijie Li .
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[40]   Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? [J].
Chen, Sanyuan ;
Wu, Yu ;
Wang, Chengyi ;
Liu, Shujie ;
Chen, Zhuo ;
Wang, Peidong ;
Liu, Gang ;
Li, Jinyu ;
Wu, Jian ;
Yu, Xiangzhan ;
Wei, Furu .
INTERSPEECH 2022, 2022, :3699-3703