DIRICHLET MIXTURE MODELS OF NEURAL NET POSTERIORS FOR HMM-BASED SPEECH RECOGNITION

被引:0
|
作者
Balakrishnan, V [1 ]
Sivaram, G. S. V. S. [1 ]
Khudanpur, Sanjeev [1 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
来源
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年
关键词
Dirichlet distribution; neural network posteriors; HMMs;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a novel technique for modeling the posterior probability estimates obtained from a neural network directly in the HMM framework using the Dirichlet Mixture Models (DMMs). Since posterior probability vectors lie on a probability simplex their distribution can be modeled using DMMs. Being in an exponential family, the parameters of DMMs can be estimated in an efficient manner. Conventional approaches like TANDEM attempt to gaussianize the posteriors by suitable transforms and model them using Gaussian Mixture Models (GMMs). This requires more number of parameters as it does not exploit the fact that the probability vectors lie on a simplex. We demonstrate through TIMIT phoneme recognition experiments that the proposed technique outperforms the conventional TANDEM approach.
引用
收藏
页码:5028 / 5031
页数:4
相关论文
共 50 条
  • [1] An HMM-based speech recognition IC
    Han, W
    Hon, KW
    Chan, CF
    Lee, T
    Choy, CS
    Pun, KP
    Ching, PC
    PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II: COMMUNICATIONS-MULTIMEDIA SYSTEMS & APPLICATIONS, 2003, : 744 - 747
  • [2] Peripheral features for HMM-based speech recognition
    Fukuda, T
    Takigawa, M
    Nitta, T
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132
  • [3] Hybrid NN/HMM-based speech recognition with a discriminant neural feature extraction
    Willett, D
    Rigoll, G
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 763 - 769
  • [4] Use of voicing features in HMM-based speech recognition
    Thomson, DL
    Chengalvarayan, R
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211
  • [5] Modified Viterbi Scoring for HMM-Based Speech Recognition
    Jo, Jihyuck
    Kim, Han-Gyu
    Park, In-Cheol
    Jung, Bang Chul
    Yoo, Hoyoung
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2019, 25 (02): : 351 - 358
  • [6] Normalized training for HMM-based visual speech recognition
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    Kitamura, Tadashi
    Kobayashi, Takao
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 40 - 50
  • [7] Simplified scoring methods for HMM-based speech recognition
    Paramonov, Pavel
    Sutula, Nadezhda
    SOFT COMPUTING, 2016, 20 (09) : 3455 - 3460
  • [8] Normalized training for HMM-based visual speech recognition
    Nankaku, Y
    Tokuda, K
    Kitamura, T
    Kobayashi, T
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 234 - 237
  • [9] An HMM-based method for Thai spelling speech recognition
    Pisarn, C.
    Theeramunkong, T.
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2007, 54 (01) : 76 - 95
  • [10] Simplified scoring methods for HMM-based speech recognition
    Pavel Paramonov
    Nadezhda Sutula
    Soft Computing, 2016, 20 : 3455 - 3460