Semi-supervised Training of Acoustic Models Leveraging Knowledge Transferred from Out-of-Domain Data

被引:0
作者
Lo, Tien-Hong [1 ]
Chen, Berlin [1 ,2 ]
机构
[1] Natl Taiwan Normal Univ, Taipei, Taiwan
[2] Pervas Artificial Intelligence Res PAIR Labs, Taipei, Taiwan
来源
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年
关键词
D O I
10.1109/apsipaasc47483.2019.9023040
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
More recently, a novel objective function of discriminative acoustic model training, namely lattice-free MMI (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows excellent performance in a wide array of ASR tasks with supervised training settings, there is a dearth of work on investigating its effectiveness in the scenario of unsupervised or semi-supervised training. On the other hand, semi-supervised (or self-training) of acoustic model suffers from the problem that it is hard to estimate a good model when only a limited amount of correctly transcribed data is made available. It is also generally acknowledged that the performance of discriminative training is vulnerable to correctness of speech transcripts employed for training. In view of the above, this paper explores two novel extensions to LF-MMI. The first one is to distill knowledge (acoustic training statistics) from a large amount of out-of-domain data to better estimate the seed models for use in semi-supervised training. The second one is to make effective selection of the untranscribed target domain data for semi-supervised training. A series of experiments conducted on the AMI benchmark corpus demonstrate the gains from these two extensions are pronounced and additive, which also reveals their effectiveness and viability.
引用
收藏
页码:1400 / 1404
页数:5
相关论文
共 34 条
[1]  
[Anonymous], 1983, IEEE T ACOUST SPEECH
[2]  
[Anonymous], 2013, INT CONF ACOUST SPEE
[3]  
[Anonymous], 2007, SPEECH COMMUN, DOI DOI 10.1016/J.SPECOM.2006.11.005
[4]  
[Anonymous], 2016, PROCEDIA COMPUT SCI, DOI DOI 10.1016/J.PROCS.2016.04.043
[5]  
[Anonymous], 2015, KNOWL BASED SYST, DOI DOI 10.1016/J.KNOSYS.2015.01.010
[6]  
[Anonymous], 2013, INT CONF ACOUST SPEE
[7]  
[Anonymous], 2011, IEEE T AUDIO SPEECH, DOI DOI 10.1109/TASL.2010.2064307
[8]  
[Anonymous], 2015, ASIAPAC SIGN INFO PR
[9]  
[Anonymous], 2012, IEEE W SP LANG TECH
[10]  
[Anonymous], 2017, 2017 IEEE AUT