Dual Script E2E Framework for Multilingual and Code-Switching ASR

被引:2
作者
Kumar, Mari Ganesh [1 ]
Kuriakose, Jom [1 ]
Thyagachandran, Anand [1 ]
Kumar, Arun A. [1 ]
Seth, Ashish [1 ]
Prasad, Lodagala V. S. V. Durga [1 ]
Jaiswal, Saish [1 ]
Prakash, Anusha [1 ]
Murthy, Hema A. [1 ]
机构
[1] Indian Inst Technol Madras, Chennai, Tamil Nadu, India
来源
INTERSPEECH 2021 | 2021年
关键词
speech recognition; low-resource; multilingual; common label set; dual script;
D O I
10.21437/Interspeech.2021-978
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
India is home to multiple languages, and training automatic speech recognition (ASR) systems is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems. Inspired by results in text-to-speech synthesis, in this paper, we use an in-house rule-based phoneme-level common label set (CLS) representation to train multilingual and code-switching ASR for Indian languages. We propose two end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven back-end to recover the native language script. In the second system, we propose a modification to the E2E model, wherein the CLS representation and the native language characters are used simultaneously for training. We show our results on the multilingual and code-switching (MUCS) ASR challenge 2021. Our best results achieve approximate to 6% and 5% improvement in word error rate over the baseline system for the multilingual and code-switching tasks, respectively, on the challenge development data.
引用
收藏
页码:2441 / 2445
页数:5
相关论文
共 20 条
  • [1] [Anonymous], 2021, MULTILINGUAL CODE SW
  • [2] A Unified Parser for Developing Indian Language Text to Speech Synthesizers
    Baby, Arun
    Nishanthi, N. L.
    Thomas, Anju Leela
    Murthy, Hema A.
    [J]. TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 514 - 521
  • [3] Baby Arun, 2016, P TEXT SPEECH DIAL
  • [4] Datta A, 2020, INT CONF ACOUST SPEE, P8239, DOI [10.1109/icassp40776.2020.9053443, 10.1109/ICASSP40776.2020.9053443]
  • [5] Diwan A., 2021, ARXIV210400235
  • [6] Conformer: Convolution-augmented Transformer for Speech Recognition
    Gulati, Anmol
    Qin, James
    Chiu, Chung-Cheng
    Parmar, Niki
    Zhang, Yu
    Yu, Jiahui
    Han, Wei
    Wang, Shibo
    Zhang, Zhengdong
    Wu, Yonghui
    Pang, Ruoming
    [J]. INTERSPEECH 2020, 2020, : 5036 - 5040
  • [7] Kakwani D, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P4948
  • [8] OpenNMT: Open-Source Toolkit for Neural Machine Translation
    Klein, Guillaume
    Kim, Yoon
    Deng, Yuntian
    Senellart, Jean
    Rush, Alexander M.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017): SYSTEM DEMONSTRATIONS, 2017, : 67 - 72
  • [9] Kudo T., 2018, Subword regularization: Improving neural network translation models with multiple subword candidates
  • [10] Park K., 2019, g2pe