Building Acoustic Model Ensembles by Data Sampling With Enhanced Trainings and Features

被引：9

作者：

Chen, Xin ^{[1
]}

Zhao, Yunxin ^{[2
]}

机构：

[1] Pearson Knowledge Technol, Menlo Pk, CA 94025 USA

[2] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 03期

基金：

美国国家科学基金会;

关键词：

Ensemble acoustic model; cross validation data sampling; speaker clustering data sampling; discriminative training; MLP feature;

D O I：

10.1109/TASL.2012.2227729

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a novel approach of using Cross Validation (CV) and Speaker Clustering (SC) based data samplings to construct an ensemble of acoustic models for speech recognition. We also investigate the effects of the existing techniques of Cross Validation Expectation Maximization (CVEM), Discriminative Training (DT), and Multiple Layer Perceptron (MLP) features on the quality of the proposed ensemble acoustic models (EAMs). We have evaluated the proposed methods on TIMIT phoneme recognition task as well as on a telemedicine automatic captioning task. The proposed methods have led to significant improvements in recognition accuracy over conventional Hidden Markov Model (HMM) baseline systems, and the integration of EAMs with CVEM, DT, and MLP has also significantly improved the accuracy performances of the single model systems based on CVEM, DT, and MLP, where the increased inter-model diversity is shown to have played an important role in the performance gain.

引用

页码：498 / 507

页数：10

共 32 条

[1] [Anonymous], HTK TOOLKIT
[2] [Anonymous], NIPS 22 WORKSH DEEP
[3] Bahl L. R., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P49
[4] Recent experiments in Large Vocabulary Conversational Speech Recognition
Billa, J
Colhurst, T
El-Jaroudi, A
Iyer, R
Ma, K
Matsoukas, S
Quillen, C
Richardson, F
Siu, M
Zavaliagkos, G
Gish, H
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 41 - 44
[5] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[6] Bresline C., 2007, P INTERSPEECH, P1441
[7] Chen X, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P1349
[8] DATA SAMPLING ENSEMBLE ACOUSTIC MODELLING IN SPEAKER INDEPENDENT SPEECH RECOGNITION
Chen, Xin
Zhao, Yunxin
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5130 - 5133
[9] DATA SAMPLING BASED ENSEMBLE ACOUSTIC MODELLING
Chen, Xin
Zhao, Yunxin
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3805 - 3808
[10] Cui X., 2009, P INTERSPEECH, P240

← 1 2 3 4 →