End-to-End Myanmar Speech Recognition with Human-Machine Cooperation

被引：0

作者：

Wang, Faliang ^{[1
]}

Yang, Yiling ^{[1
]}

Yang, Jian ^{[1
]}

机构：

[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming, Yunnan, Peoples R China

来源：

2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

Myanmar language; automatic speech recognition; end-to-end; pre-training; human-machine cooperation;

D O I：

10.1109/IALP57159.2022.9961316

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

End-to-end automatic speech recognition based on deep neural networks has achieved satisfactory results in some languages with large-scale trainable data. Due to the small scale of available training data, there is still a significant gap between Myanmar speech recognition and application needs. Based on the end-to-end model of Transformer, this paper studies the influence of character, byte pair encoding (BPE), and syllable label encoding on recognition rate. For the Myanmar speech recognition model in a low-resource environment, we propose and implement two training methods: (1) The self-supervised pre-training speech representations model based on massive English data is introduced into the Myanmar speech recognition model as an acoustic feature extractor. The experimental results show that the pre-training speech representations method has an obvious effect on end-to-end Myanmar speech recognition. (2) Expand the scale of the database using human-machine cooperation, to improve the speech recognition rate. Based on the method (1), through a human-machine cooperation experiment, 2898 new corpora were obtained after 6 iterations. The character error rate of the Myanmar speech recognition model on the development set decreased from 4.8% to 4.1%, and the character error rate of the test set decreased from 5.5% to 4.2%. The time required for manual proofreading of each batch of the speech corpus has decreased from 19.1 hours to 16.3 hours. Experiments have proved the effectiveness of the human-machine cooperation method.

引用

页码：156 / 161

页数：6

共 22 条

[1]

[Anonymous], 2012, INT C MACHINE LEARNI

[2] Documentation of plant diversity of Southeast Asia: the new role of Belt and Road Initiative [J].

Aung, Mung Htoi ;

Li, De Zhu ;

Tan, Yun Hong ;

Xia, Nian He ;

Quan, Rui Chuan ;

Jin, Xiao Hua .

PHYTOKEYS, 2020, (138) :1-2

[3]

Baevski A, 2020, Arxiv, DOI arXiv:2006.11477

[4]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[5] AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR END-TO-END SPEECH RECOGNITION [J].

Chang, Xuankai ;

Maekaku, Takashi ;

Guo, Pengcheng ;

Shi, Jing ;

Lu, Yen-Ju ;

Subramanian, Aswin Shanmugam ;

Wang, Tianzi ;

Yang, Shu-wen ;

Tsao, Yu ;

Lee, Hung-yi ;

Watanabe, Shinji .

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :228-235

[6] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [J].

Chen, Sanyuan ;

Wang, Chengyi ;

Chen, Zhengyang ;

Wu, Yu ;

Liu, Shujie ;

Chen, Zhuo ;

Li, Jinyu ;

Kanda, Naoyuki ;

Yoshioka, Takuya ;

Xiao, Xiong ;

Wu, Jian ;

Zhou, Long ;

Ren, Shuo ;

Qian, Yanmin ;

Qian, Yao ;

Zeng, Michael ;

Yu, Xiangzhan ;

Wei, Furu .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) :1505-1518

[7]

Chit K. M. M., 2021, ARXIV

[8]

Dong L., 2020, COMP LABEL SYNCHRONO, Vabs, P10113

[9]

Graves A., 2006, P 23 INT C MACH LEAR, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]

[10] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [J].

Hsu, Wei-Ning ;

Bolte, Benjamin ;

Tsai, Yao-Hung Hubert ;

Lakhotia, Kushal ;

Salakhutdinov, Ruslan ;

Mohamed, Abdelrahman .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3451-3460

← 1 2 3 →