Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition

被引:0
作者
Li, Sheng [1 ]
Li, Jiyi [2 ]
机构
[1] Natl Inst Informat & Commun Technol, Kyoto, Japan
[2] Univ Yamanashi, Kofu, Yamanashi, Japan
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII | 2023年 / 14260卷
关键词
speech recognition; pretrained language models (PLMs); Taiwan-accented speech;
D O I
10.1007/978-3-031-44195-0_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Taiwan-accented speech bears similarities to the Mandarin Min dialect, but with substantial differences in vocabulary, which significantly impacts spoken language recognition outcomes. This paper concentrates on integrating pre-trained language models (PLMs) with state-of-the-art self-supervised learning (SSL)-based speech recognition systems for Taiwan-accented speech recognition tasks. We propose a progressive error correction process in tandem with recognition to fully exploit the autoregressive nature of PLM models. Experimental results demonstrate that our method effectively addresses recognition errors stemming from misspelled vocabulary in accented speech. Our proposed progressive approach achieves roughly a 0.5% improvement compared to the conventional method. Furthermore, we demonstrate that fine-tuning PLMs solely with the text from the accented dataset can enhance recognition performance, despite the limitations of accented speech resources.
引用
收藏
页码:389 / 400
页数:12
相关论文
共 39 条
[1]  
Amodei D, 2016, PR MACH LEARN RES, V48
[2]  
[Anonymous], 2016, P IEEE ICASSP
[3]  
[Anonymous], 2006, P 23 INT C MACH LEAR, DOI DOI 10.1145/1143844.1143891
[4]  
Baevski A., 2021, Advances in Neural Information Processing Systems, VVolume 34, P27826
[5]  
Baevski A, 2020, ADV NEUR IN, V33
[6]  
Baevski A, 2020, INT CONF ACOUST SPEE, P7694, DOI [10.1109/ICASSP40776.2020.9054224, 10.1109/icassp40776.2020.9054224]
[7]   Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT [J].
Bai, Ye ;
Yi, Jiangyan ;
Tao, Jianhua ;
Tian, Zhengkun ;
Wen, Zhengqi ;
Zhang, Shuai .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :1897-1911
[8]  
Chiu CC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4774, DOI 10.1109/ICASSP.2018.8462105
[9]  
Chorowski J, 2014, Arxiv, DOI arXiv:1412.1602
[10]  
Chorowski J, 2015, ADV NEUR IN, V28