Learning Fast Adaptation on Cross-Accented Speech Recognition

被引:40
作者
Winate, Genta Indra [1 ]
Cahyawijaya, Samuel [1 ]
Liu, Zihan [1 ]
Lin, Zhaojiang [1 ]
Madotto, Andrea [1 ]
Xu, Peng [1 ]
Fung, Pascale [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Ctr Artificial Intelligence Res CAiRE, Hong Kong, Peoples R China
来源
INTERSPEECH 2020 | 2020年
关键词
speech recognition; accent-agnostic; cross-accent; meta-learning; fast adaptation;
D O I
10.21437/Interspeech.2020-45
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Local dialects influence people to pronounce words of the same language differently from each other. The great variability and complex characteristics of accents create a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system. In this paper, we introduce a cross-accented English speech recognition task as a benchmark for measuring the ability of the model to adapt to unseen accents using the existing CommonVoice corpus. We also propose an accent-agnostic approach that extends the model-agnostic meta-learning (MAML) algorithm for fast adaptation to unseen accents. Our approach significantly outperforms joint training in both zero-shot, few-shot, and all-shot in the mixed-region and cross-region settings in terms of word error rate.
引用
收藏
页码:1276 / 1280
页数:5
相关论文
共 32 条
  • [1] [Anonymous], 2019, P 23 C COMP NAT LANG
  • [2] ARDILA R., 2019, ARXIV191206670
  • [3] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
  • [4] Finn C, 2018, ADV NEUR IN, V31
  • [5] Finn C, 2017, PR MACH LEARN RES, V70
  • [6] Gu JT, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3622
  • [7] Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
    Hari, Takaaki
    Watanabe, Shinji
    Zhang, Yu
    Chan, William
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 949 - 953
  • [8] Hsu J.-Y., 2019, ARXIV191012094
  • [9] Huang P S., 2018, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, P732
  • [10] A Multi-Accent Acoustic Model using Mixture of Experts for Speech Recognition
    Jain, Abhinav
    Singh, Vishwanath P.
    Rath, Shakti P.
    [J]. INTERSPEECH 2019, 2019, : 779 - 783