Self-supervised Learning and Masked Language Model for Code-switching Automatic Speech Recognition

被引：0

作者：

Chen, Po-Kai ^{[1
]}

Fu, Li-Yeh ^{[2
]}

Chen, Cheng-Kai ^{[1
]}

Lin, Yi-Xing ^{[1
]}

Chen, Chih-Ping ^{[1
]}

Huang, Chien-Lin ^{[3
]}

Wang, Jia-Ching ^{[1
]}

机构：

[1] Natl Cent Univ, Dept CSIE, Taoyuan, Taiwan

[2] Realtek Semicond Corp, Hsinchu, Taiwan

[3] Natl Cheng Kung Univ, Dept CSIE, Tainan, Taiwan

来源：

2024 IEEE TENTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, ICCE 2024 | 2024年

关键词：

code-switching; speech recognition; self-supervised learning; masked language modeling;

D O I：

10.1109/ICCE62051.2024.10634607

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Code-switching (CS) is a common linguistic phenomenon that poses significant challenges for automatic speech recognition systems due to the lack of corpus. In this paper, we propose a novel approach to address this challenge by leveraging self-supervised learning (SSL) and the masked language model (MLM) in speech recognition. Specifically, we use the wav2vec 2.0 pre-trained model to reduce the dependency on CS labeled data, and the MLM to rerank sentences generated using beam search decoding. Our proposed method is evaluated on the SEAME dataset, and experimental results show that it outperforms state-of-the-art CS speech recognition approaches by 15.6% and 19.9% in terms of token error rates (TER). Moreover, the proposed method is generalizable and can be extended to other CS languages. These results demonstrate the effectiveness of our approach and its potential for future research in the field of CS speech recognition.

引用

页码：387 / 391

页数：5

共 50 条

[21] Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition [J].

Wang, Wenxuan ;

Ma, Guodong ;

Li, Yuke ;

Du, Binbin .

INTERSPEECH 2023, 2023, :1389-1393

[22] Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition [J].

Guo, Pengcheng ;

Xu, Haihua ;

Xie, Lei ;

Chng, Eng Siong .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1928-1932

[23] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION [J].

Zhu, Qiu-Shi ;

Zhang, Jie ;

Zhang, Zi-Qiang ;

Wu, Ming-Hui ;

Fang, Xin ;

Dai, Li-Rong .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :3174-3178

[24] Automatic self-supervised learning of associations between speech and text [J].

Knuuttila, Juho ;

Rasanen, Okko ;

Laine, Unto K. .

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :465-469

[25] TASK ORIENTED DIALOGUE AS A CATALYST FOR SELF-SUPERVISED AUTOMATIC SPEECH RECOGNITION [J].

Chan, David M. ;

Ghosht, Shalini ;

Tulsian, Hitesh ;

Rastrowt, Ariya ;

Hofftneistert, Bjtim ;

al, Chang Et .

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :11806-11810

[26] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition [J].

Atmaja, Bagus Tris ;

Sasou, Akira .

IEEE ACCESS, 2022, 10 :124396-124407

[27] IITG-HingCoS corpus: A Hinglish code-switching database for automatic speech recognition [J].

Ganji, Sreeram ;

Dhawan, Kunal ;

Sinha, Rohit .

SPEECH COMMUNICATION, 2019, 110 :76-89

[28] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [J].

Hsu, Wei-Ning ;

Bolte, Benjamin ;

Tsai, Yao-Hung Hubert ;

Lakhotia, Kushal ;

Salakhutdinov, Ruslan ;

Mohamed, Abdelrahman .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3451-3460

[29] Code-Switching Automatic Speech Recognition for Nursing Record Documentation: System Development and Evaluation [J].

Hou, Shih-Yen ;

Wu, Ya-Lun ;

Chen, Kai-Ching ;

Chang, Ting-An ;

Hsu, Yi-Min ;

Chuang, Su-Jung ;

Chang, Ying ;

Hsu, Kai-Cheng .

JMIR NURSING, 2022, 5 (01)

[30] Cyclic Transfer Learning for Mandarin-English Code-Switching Speech Recognition [J].

Nga, Cao Hong ;

Vu, Duc-Quang ;

Luong, Huong Hoang ;

Huang, Chien-Lin ;

Wang, Jia-Ching .

IEEE SIGNAL PROCESSING LETTERS, 2023, 30 :1387-1391

← 1 2 3 4 5 →