Semi-supervised learning of speech recognizers based on variational autoencoder and unsupervised data augmentation

被引:0
作者
Ho, Hyeon [1 ]
Kang, Byung Ok [1 ]
Kwon, Oh-Wook [1 ]
机构
[1] Chungbuk Natl Univ, Dept Intelligent Syst & Robot, Chungdae Ro 1, Cheongju 28644, South Korea
来源
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | 2021年 / 40卷 / 06期
关键词
End-to-End ASR; Variational AutoEncoder (VAE); Data augmentation; Semi-supervised learning;
D O I
10.7776/ASK.2021.40.6.578
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a semi-supervised learning method based on Variational AutoEncoder (VAE) and Unsupervised Data Augmentation (UDA) to improve the performance of an end-to-end speech recognizer. In the proposed method, first, the VAE-based augmentation model and the baseline end-to-end speech recognizer are trained using the original speech data. Then, the baseline end-to-end speech recognizer is trained again using data augmented from the learned augmentation model. Finally, the learned augmentation model and end-to-end speech recognizer are re-learned using the UDA-based semi-supervised learning method. As a result of the computer simulation, the augmentation model is shown to improve the Word Error Rate (WER) of the baseline end-to-end speech recognizer, and further improve its performance by combining it with the UDA-based learning method.
引用
收藏
页码:578 / 586
页数:9
相关论文
共 16 条
[1]  
Baskar Murali Karthick, 2019, INTERSPEECH, P3790
[2]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[3]  
Hori T, 2019, INT CONF ACOUST SPEE, P6271, DOI [10.1109/icassp.2019.8683307, 10.1109/ICASSP.2019.8683307]
[4]  
Jaitly Navdeep, 2013, Proc. 30th Int. Conf. Mach. Learn. (ICML), V117
[5]  
KINGMA D. P., 2014, P 2 INT C LEARNING R, P3
[6]   Large-Scale Domain Adaptation via Teacher-Student Learning [J].
Li, Jinyu ;
Seltzer, Michael L. ;
Wang, Xi ;
Zhao, Rui ;
Gong, Yifan .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2386-2390
[7]  
Panayotov V, 2015, INT CONF ACOUST SPEE, P5206, DOI 10.1109/ICASSP.2015.7178964
[8]   SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition [J].
Park, Daniel S. ;
Chan, William ;
Zhang, Yu ;
Chiu, Chung-Cheng ;
Zoph, Barret ;
Cubuk, Ekin D. ;
Le, Quoc, V .
INTERSPEECH 2019, 2019, :2613-2617
[9]  
PAUL DB, 1992, SPEECH AND NATURAL LANGUAGE, P357
[10]   Self-training with Noisy Student improves ImageNet classification [J].
Xie, Qizhe ;
Luong, Minh-Thang ;
Hovy, Eduard ;
Le, Quoc, V .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10684-10695