Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses

被引：10

作者：

Li, Sheng ^{[1
]}

Akita, Yuya ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2016年 / 24卷 / 09期

关键词：

acoustic model; lecture transcription; semi-supervised training; Speech recognition; SPEECH RECOGNITION; CONFIDENCE MEASURES;

D O I：

10.1109/TASLP.2016.2562505

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While the performance of ASR systems depends on the size of the training data, it is very costly to prepare accurate and faithful transcripts. In this paper, we investigate a semisupervised training scheme, which takes the advantage of huge quantities of unlabeled video lecture archive, particularly for the deep neural network (DNN) acoustic model. In the proposed method, we obtain ASR hypotheses by complementary GMM- and DNN-based ASR systems. Then, a set of CRF-based classifiers is trained to select the correct hypotheses and verify the selected data. The proposed hypothesis combination shows higher quality compared with the conventional system combination method (ROVER). Moreover, compared with the conventional data selection based on confidence measure score, our method is demonstrated more effective for filtering usable data. Significant improvement in the ASR accuracy is achieved over the baseline system and in comparison with the models trained with the conventional system combination and data selection methods.

引用

页码：1524 / 1534

页数：11

共 67 条

[1] [Anonymous], P 2014 IEEE INT C AC
[2] [Anonymous], 2010003 UTML TR U TO
[3] [Anonymous], 2007, INTERSPEECH 2007
[4] [Anonymous], 2013, P INTERSPEECH
[5] [Anonymous], 2013, P 2013 C N AM CHAPTE
[6] [Anonymous], 2001, CONDITIONAL RANDOM F
[7] Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Audhkhasi, Kartik
Zavou, Andreas M.
Georgiou, Panayiotis G.
Narayanan, Shrikanth S.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (03) : 711 - 726
[8] Chen W, 2013, INT CONF ACOUST SPEE, P7418, DOI 10.1109/ICASSP.2013.6639104
[9] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
[10] Deng L., 2014, INTERSPEECH SING

← 1 2 3 4 5 6 7 →