Semi-supervised learning of speech recognizers based on variational autoencoder and unsupervised data augmentation

被引：0

作者：

Ho, Hyeon ^{[1
]}

Kang, Byung Ok ^{[1
]}

Kwon, Oh-Wook ^{[1
]}

机构：

[1] Chungbuk Natl Univ, Dept Intelligent Syst & Robot, Chungdae Ro 1, Cheongju 28644, South Korea

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | 2021年 / 40卷 / 06期

关键词：

End-to-End ASR; Variational AutoEncoder (VAE); Data augmentation; Semi-supervised learning;

D O I：

10.7776/ASK.2021.40.6.578

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a semi-supervised learning method based on Variational AutoEncoder (VAE) and Unsupervised Data Augmentation (UDA) to improve the performance of an end-to-end speech recognizer. In the proposed method, first, the VAE-based augmentation model and the baseline end-to-end speech recognizer are trained using the original speech data. Then, the baseline end-to-end speech recognizer is trained again using data augmented from the learned augmentation model. Finally, the learned augmentation model and end-to-end speech recognizer are re-learned using the UDA-based semi-supervised learning method. As a result of the computer simulation, the augmentation model is shown to improve the Word Error Rate (WER) of the baseline end-to-end speech recognizer, and further improve its performance by combining it with the UDA-based learning method.

引用

页码：578 / 586

页数：9

共 16 条

[1]

Baskar Murali Karthick, 2019, INTERSPEECH, P3790

[2]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[3]

Hori T, 2019, INT CONF ACOUST SPEE, P6271, DOI [10.1109/icassp.2019.8683307, 10.1109/ICASSP.2019.8683307]

[4]

Jaitly Navdeep, 2013, Proc. 30th Int. Conf. Mach. Learn. (ICML), V117

[5]

KINGMA D. P., 2014, P 2 INT C LEARNING R, P3

[6] Large-Scale Domain Adaptation via Teacher-Student Learning [J].

Li, Jinyu ;

Seltzer, Michael L. ;

Wang, Xi ;

Zhao, Rui ;

Gong, Yifan .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2386-2390

[7]

Panayotov V, 2015, INT CONF ACOUST SPEE, P5206, DOI 10.1109/ICASSP.2015.7178964

[8] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition [J].

Park, Daniel S. ;

Chan, William ;

Zhang, Yu ;

Chiu, Chung-Cheng ;

Zoph, Barret ;

Cubuk, Ekin D. ;

Le, Quoc, V .

INTERSPEECH 2019, 2019, :2613-2617

[9]

PAUL DB, 1992, SPEECH AND NATURAL LANGUAGE, P357

[10] Self-training with Noisy Student improves ImageNet classification [J].

Xie, Qizhe ;

Luong, Minh-Thang ;

Hovy, Eduard ;

Le, Quoc, V .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10684-10695

← 1 2 →