END-TO-END MULTI-ACCENT SPEECH RECOGNITION WITH UNSUPERVISED ACCENT MODELLING

被引:10
作者
Li, Song [1 ]
Ouyang, Beibei [1 ]
Liao, Dexin [2 ]
Xia, Shipeng [2 ]
Li, Lin [1 ]
Hong, Qingyang [2 ]
机构
[1] Xiamen Univ, Sch Elect Sci & Technol, Xiamen, Peoples R China
[2] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
基金
中国国家自然科学基金;
关键词
End-to-end; speech recognition; multi-accent; global embedding;
D O I
10.1109/ICASSP39728.2021.9414833
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end speech recognition has achieved good recognition performance on standard English pronunciation datasets. However, one prominent problem with end-to-end speech recognition systems is that non-native English speakers tend to have complex and varied accents, which reduces the accuracy of English speech recognition in different countries. In order to grapple with such an issue, we first investigate and improve the current mainstream end-to-end multi-accent speech recognition technologies. In addition, we propose two unsupervised accent modelling methods, which convert accent information into a global embedding, and use it to improve the performance of the end-to-end multi-accent speech recognition systems. Experimental results on accented English datasets of eight countries (AESRC2020) show that, compared with the Transformer baseline, our proposed methods achieve relative 14.8% and 15.4% average word error rate (WER) reduction in the development set and evaluation set, respectively.
引用
收藏
页码:6418 / 6422
页数:5
相关论文
共 17 条
  • [1] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
  • [2] Chiu CC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4774, DOI 10.1109/ICASSP.2018.8462105
  • [3] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
    Jain, Abhinav
    Upreti, Minali
    Jyothi, Preethi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
  • [4] Karita S, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P449, DOI [10.1109/ASRU46091.2019.9003750, 10.1109/asru46091.2019.9003750]
  • [5] Li B, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4749, DOI 10.1109/ICASSP.2018.8461886
  • [6] Miao HR, 2020, INT CONF ACOUST SPEE, P6084, DOI [10.1109/icassp40776.2020.9053165, 10.1109/ICASSP40776.2020.9053165]
  • [7] Peddinti V, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3214
  • [8] Povey D., 2011, ASRU 2011, P1
  • [9] Shen J, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4779, DOI 10.1109/ICASSP.2018.8461368
  • [10] Snyder D, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5329