End-to-End Myanmar Speech Recognition with Human-Machine Cooperation

被引:0
作者
Wang, Faliang [1 ]
Yang, Yiling [1 ]
Yang, Jian [1 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming, Yunnan, Peoples R China
来源
2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022) | 2022年
基金
中国国家自然科学基金;
关键词
Myanmar language; automatic speech recognition; end-to-end; pre-training; human-machine cooperation;
D O I
10.1109/IALP57159.2022.9961316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end automatic speech recognition based on deep neural networks has achieved satisfactory results in some languages with large-scale trainable data. Due to the small scale of available training data, there is still a significant gap between Myanmar speech recognition and application needs. Based on the end-to-end model of Transformer, this paper studies the influence of character, byte pair encoding (BPE), and syllable label encoding on recognition rate. For the Myanmar speech recognition model in a low-resource environment, we propose and implement two training methods: (1) The self-supervised pre-training speech representations model based on massive English data is introduced into the Myanmar speech recognition model as an acoustic feature extractor. The experimental results show that the pre-training speech representations method has an obvious effect on end-to-end Myanmar speech recognition. (2) Expand the scale of the database using human-machine cooperation, to improve the speech recognition rate. Based on the method (1), through a human-machine cooperation experiment, 2898 new corpora were obtained after 6 iterations. The character error rate of the Myanmar speech recognition model on the development set decreased from 4.8% to 4.1%, and the character error rate of the test set decreased from 5.5% to 4.2%. The time required for manual proofreading of each batch of the speech corpus has decreased from 19.1 hours to 16.3 hours. Experiments have proved the effectiveness of the human-machine cooperation method.
引用
收藏
页码:156 / 161
页数:6
相关论文
共 22 条
[1]  
[Anonymous], 2012, INT C MACHINE LEARNI
[2]   Documentation of plant diversity of Southeast Asia: the new role of Belt and Road Initiative [J].
Aung, Mung Htoi ;
Li, De Zhu ;
Tan, Yun Hong ;
Xia, Nian He ;
Quan, Rui Chuan ;
Jin, Xiao Hua .
PHYTOKEYS, 2020, (138) :1-2
[3]  
Baevski A, 2020, Arxiv, DOI arXiv:2006.11477
[4]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[5]   AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR END-TO-END SPEECH RECOGNITION [J].
Chang, Xuankai ;
Maekaku, Takashi ;
Guo, Pengcheng ;
Shi, Jing ;
Lu, Yen-Ju ;
Subramanian, Aswin Shanmugam ;
Wang, Tianzi ;
Yang, Shu-wen ;
Tsao, Yu ;
Lee, Hung-yi ;
Watanabe, Shinji .
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :228-235
[6]   WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [J].
Chen, Sanyuan ;
Wang, Chengyi ;
Chen, Zhengyang ;
Wu, Yu ;
Liu, Shujie ;
Chen, Zhuo ;
Li, Jinyu ;
Kanda, Naoyuki ;
Yoshioka, Takuya ;
Xiao, Xiong ;
Wu, Jian ;
Zhou, Long ;
Ren, Shuo ;
Qian, Yanmin ;
Qian, Yao ;
Zeng, Michael ;
Yu, Xiangzhan ;
Wei, Furu .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) :1505-1518
[7]  
Chit K. M. M., 2021, ARXIV
[8]  
Dong L., 2020, COMP LABEL SYNCHRONO, Vabs, P10113
[9]  
Graves A., 2006, P 23 INT C MACH LEAR, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]
[10]   HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [J].
Hsu, Wei-Ning ;
Bolte, Benjamin ;
Tsai, Yao-Hung Hubert ;
Lakhotia, Kushal ;
Salakhutdinov, Ruslan ;
Mohamed, Abdelrahman .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3451-3460