Dual Script E2E Framework for Multilingual and Code-Switching ASR

被引：2

作者：

Kumar, Mari Ganesh ^{[1
]}

Kuriakose, Jom ^{[1
]}

Thyagachandran, Anand ^{[1
]}

Kumar, Arun A. ^{[1
]}

Seth, Ashish ^{[1
]}

Prasad, Lodagala V. S. V. Durga ^{[1
]}

Jaiswal, Saish ^{[1
]}

Prakash, Anusha ^{[1
]}

Murthy, Hema A. ^{[1
]}

机构：

[1] Indian Inst Technol Madras, Chennai, Tamil Nadu, India

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech recognition; low-resource; multilingual; common label set; dual script;

D O I：

10.21437/Interspeech.2021-978

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

India is home to multiple languages, and training automatic speech recognition (ASR) systems is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems. Inspired by results in text-to-speech synthesis, in this paper, we use an in-house rule-based phoneme-level common label set (CLS) representation to train multilingual and code-switching ASR for Indian languages. We propose two end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven back-end to recover the native language script. In the second system, we propose a modification to the E2E model, wherein the CLS representation and the native language characters are used simultaneously for training. We show our results on the multilingual and code-switching (MUCS) ASR challenge 2021. Our best results achieve approximate to 6% and 5% improvement in word error rate over the baseline system for the multilingual and code-switching tasks, respectively, on the challenge development data.

引用

页码：2441 / 2445

页数：5

共 20 条

[1] [Anonymous], 2021, MULTILINGUAL CODE SW
[2] A Unified Parser for Developing Indian Language Text to Speech Synthesizers
Baby, Arun
Nishanthi, N. L.
Thomas, Anju Leela
Murthy, Hema A.
[J]. TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 514 - 521
[3] Baby Arun, 2016, P TEXT SPEECH DIAL
[4] Datta A, 2020, INT CONF ACOUST SPEE, P8239, DOI [10.1109/icassp40776.2020.9053443, 10.1109/ICASSP40776.2020.9053443]
[5] Diwan A., 2021, ARXIV210400235
[6] Conformer: Convolution-augmented Transformer for Speech Recognition
Gulati, Anmol
Qin, James
Chiu, Chung-Cheng
Parmar, Niki
Zhang, Yu
Yu, Jiahui
Han, Wei
Wang, Shibo
Zhang, Zhengdong
Wu, Yonghui
Pang, Ruoming
[J]. INTERSPEECH 2020, 2020, : 5036 - 5040
[7] Kakwani D, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P4948
[8] OpenNMT: Open-Source Toolkit for Neural Machine Translation
Klein, Guillaume
Kim, Yoon
Deng, Yuntian
Senellart, Jean
Rush, Alexander M.
[J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017): SYSTEM DEMONSTRATIONS, 2017, : 67 - 72
[9] Kudo T., 2018, Subword regularization: Improving neural network translation models with multiple subword candidates
[10] Park K., 2019, g2pe

← 1 2 →