Building Acoustic Model Ensembles by Data Sampling With Enhanced Trainings and Features

被引:9
作者
Chen, Xin [1 ]
Zhao, Yunxin [2 ]
机构
[1] Pearson Knowledge Technol, Menlo Pk, CA 94025 USA
[2] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 03期
基金
美国国家科学基金会;
关键词
Ensemble acoustic model; cross validation data sampling; speaker clustering data sampling; discriminative training; MLP feature;
D O I
10.1109/TASL.2012.2227729
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel approach of using Cross Validation (CV) and Speaker Clustering (SC) based data samplings to construct an ensemble of acoustic models for speech recognition. We also investigate the effects of the existing techniques of Cross Validation Expectation Maximization (CVEM), Discriminative Training (DT), and Multiple Layer Perceptron (MLP) features on the quality of the proposed ensemble acoustic models (EAMs). We have evaluated the proposed methods on TIMIT phoneme recognition task as well as on a telemedicine automatic captioning task. The proposed methods have led to significant improvements in recognition accuracy over conventional Hidden Markov Model (HMM) baseline systems, and the integration of EAMs with CVEM, DT, and MLP has also significantly improved the accuracy performances of the single model systems based on CVEM, DT, and MLP, where the increased inter-model diversity is shown to have played an important role in the performance gain.
引用
收藏
页码:498 / 507
页数:10
相关论文
共 32 条
  • [1] [Anonymous], HTK TOOLKIT
  • [2] [Anonymous], NIPS 22 WORKSH DEEP
  • [3] Bahl L. R., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P49
  • [4] Recent experiments in Large Vocabulary Conversational Speech Recognition
    Billa, J
    Colhurst, T
    El-Jaroudi, A
    Iyer, R
    Ma, K
    Matsoukas, S
    Quillen, C
    Richardson, F
    Siu, M
    Zavaliagkos, G
    Gish, H
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 41 - 44
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Bresline C., 2007, P INTERSPEECH, P1441
  • [7] Chen X, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P1349
  • [8] DATA SAMPLING ENSEMBLE ACOUSTIC MODELLING IN SPEAKER INDEPENDENT SPEECH RECOGNITION
    Chen, Xin
    Zhao, Yunxin
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5130 - 5133
  • [9] DATA SAMPLING BASED ENSEMBLE ACOUSTIC MODELLING
    Chen, Xin
    Zhao, Yunxin
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3805 - 3808
  • [10] Cui X., 2009, P INTERSPEECH, P240