Language-specific Characteristic Assistance for Code-switching Speech Recognition

被引：4

作者：

Song, Tongtong ^{[1
]}

Xu, Qiang ^{[1
]}

Ge, Meng ^{[1
,2
]}

Wang, Longbiao ^{[1
]}

Shi, Hao ^{[3
]}

Lv, Yongjie ^{[1
]}

Lin, Yuqin ^{[1
]}

Dang, Jianwu ^{[1
,4
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan

[4] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan

来源：

INTERSPEECH 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

language-specific characteristic assistance; dualencoder; code-switching; speech recognition;

D O I：

10.21437/Interspeech.2022-11426

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition. Because LSEs are initialized by two pre-trained language-specific models (LSMs), the dual-encoder structure can exploit sufficient monolingual data and capture the individual language attributes. However, existing methods have no language constraints on LSEs and underutilize language-specific knowledge of LSMs. In this paper, we propose a language-specific characteristic assistance (LSCA) method to mitigate the above problems. Specifically, during training, we introduce two languagespecific losses as language constraints and generate corresponding language-specific targets for them. During decoding, we take the decoding abilities of LSMs into account by combining the output probabilities of two LSMs and the mixture model to obtain the final predictions. Experiments show that either the training or decoding method of LSCA can improve the model's performance. Furthermore, the best result can obtain up to 15.4% relative error reduction on the code-switching test set by combining the training and decoding methods of LSCA. Moreover, the system can process code-switching speech recognition tasks well without extra shared parameters or even retraining based on two pre-trained LSMs by using our method.

引用

页码：3924 / 3928

页数：5

共 32 条

[1] [Anonymous], 2012, 2012 IEEE INT C AC S
[2] [Anonymous], 2015, ARXIV PREPRINT ARXIV
[3] [Anonymous], 2018, Aishell-2: Transforming mandarin asr research into industrial scale
[4] Bu H, 2017, 2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), P58, DOI 10.1109/ICSDA.2017.8384449
[5] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[6] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
[7] Graves A., 2006, MACHINE LEARNING P 2, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]
[8] Graves A, 2014, PR MACH LEARN RES, V32, P1764
[9] Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[10] Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification
Huang, Zheying
Wang, Pei
Wang, Jian
Miao, Haoran
Xu, Ji
Zhang, Pengyuan
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (19):

← 1 2 3 4 →