NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION

被引：7

作者：

Chuang, Shun-Po ^{[1
]}

Chang, Heng-Jui ^{[1
]}

Huang, Sung-Feng ^{[1
]}

Lee, Hung-yi ^{[1
]}

机构：

[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

non-autoregressive; code-switching; end-to-end speech recognition;

D O I：

10.1109/ASRU51503.2021.9688174

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people. However, the intra-sentence language switching of the two very different languages makes recognizing CS speech challenging. Meanwhile, the recent successful non-autoregressive (NAR) ASR models remove the need for left-to-right beam decoding in autoregressive (AR) models and achieved outstanding performance and fast inference speed, but it has not been applied to Mandarin-English CS speech recognition. This paper takes advantage of the Mask-CTC NAR ASR framework to tackle the CS speech recognition issue. We further propose to change the Mandarin output target of the encoder to Pinyin for faster encoder training and introduce the Pinyin-to-Mandarin decoder to learn contextualized information. Moreover, we use word embedding label smoothing to regularize the decoder with contextualized information and projection matrix regularization to bridge that gap between the encoder and decoder. We evaluate these methods on the SEAME corpus and achieved exciting results.

引用

页码：465 / 472

页数：8

共 50 条

[1] Pronunciation augmentation for Mandarin-English code-switching speech recognition
Long, Yanhua
Wei, Shuang
Lian, Jie
Li, Yijie
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[2] Pronunciation augmentation for Mandarin-English code-switching speech recognition
Yanhua Long
Shuang Wei
Jie Lian
Yijie Li
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[3] INVESTIGATING END-TO-END SPEECH RECOGNITION FOR MANDARIN-ENGLISH CODE-SWITCHING
Shan, Changhao
Weng, Chao
Wang, Guangsen
Su, Dan
Luo, Min
Yu, Dong
Xie, Lei
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6056 - 6060
[4] Acoustic data augmentation for Mandarin-English code-switching speech recognition
Long, Yanhua
Li, Yijie
Zhang, Qiaozheng
Wei, Shuang
Ye, Hong
Yang, Jichen
APPLIED ACOUSTICS, 2020, 161
[5] ADDRESSING ACCENT MISMATCH IN MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
Tan, Zhili
Fan, Xinghua
Zhu, Hui
Lin, Ed
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8259 - 8263
[6] On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
Zeng, Zhiping
Khassanov, Yerbolat
Van Tung Pham
Xu, Haihua
Chng, Eng Siong
Li, Haizhou
INTERSPEECH 2019, 2019, : 2165 - 2169
[7] Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
Li, Chia-Yu
Ngoc Thang Vu
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 160 - 165
[8] Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
Fan, Zhiyun
Dong, Linhao
Shen, Chen
Liang, Zhenlin
Zhang, Jun
Lu, Lu
Ma, Zejun
INTERSPEECH 2023, 2023, : 3322 - 3326
[9] Bi-encoder Transformer Network for Mandarin-English Code-switching Speech Recognition using Mixture of Experts
Lu, Yizhou
Huang, Mingkun
Li, Hao
Guo, Jiaqi
Qian, Yanmin
INTERSPEECH 2020, 2020, : 4766 - 4770
[10] Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition
Zhang, Shuai
Yi, Jiangyan
Tian, Zhengkun
Tao, Jianhua
Bai, Ye
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,

← 1 2 3 4 5 →