Learning Fast Adaptation on Cross-Accented Speech Recognition

被引：40

作者：

Winate, Genta Indra ^{[1
]}

Cahyawijaya, Samuel ^{[1
]}

Liu, Zihan ^{[1
]}

Lin, Zhaojiang ^{[1
]}

Madotto, Andrea ^{[1
]}

Xu, Peng ^{[1
]}

Fung, Pascale ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Ctr Artificial Intelligence Res CAiRE, Hong Kong, Peoples R China

来源：

INTERSPEECH 2020 | 2020年

关键词：

speech recognition; accent-agnostic; cross-accent; meta-learning; fast adaptation;

D O I：

10.21437/Interspeech.2020-45

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Local dialects influence people to pronounce words of the same language differently from each other. The great variability and complex characteristics of accents create a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system. In this paper, we introduce a cross-accented English speech recognition task as a benchmark for measuring the ability of the model to adapt to unseen accents using the existing CommonVoice corpus. We also propose an accent-agnostic approach that extends the model-agnostic meta-learning (MAML) algorithm for fast adaptation to unseen accents. Our approach significantly outperforms joint training in both zero-shot, few-shot, and all-shot in the mixed-region and cross-region settings in terms of word error rate.

引用

页码：1276 / 1280

页数：5

共 32 条

[1] [Anonymous], 2019, P 23 C COMP NAT LANG
[2] ARDILA R., 2019, ARXIV191206670
[3] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
[4] Finn C, 2018, ADV NEUR IN, V31
[5] Finn C, 2017, PR MACH LEARN RES, V70
[6] Gu JT, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3622
[7] Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
Hari, Takaaki
Watanabe, Shinji
Zhang, Yu
Chan, William
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 949 - 953
[8] Hsu J.-Y., 2019, ARXIV191012094
[9] Huang P S., 2018, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, P732
[10] A Multi-Accent Acoustic Model using Mixture of Experts for Speech Recognition
Jain, Abhinav
Singh, Vishwanath P.
Rath, Shakti P.
[J]. INTERSPEECH 2019, 2019, : 779 - 783

← 1 2 3 4 →