Integrate Template Matching and Statistical Modeling for Speech Recognition

被引:0
作者
Sun, Xie [1 ]
Zhao, Yunxin [1 ]
机构
[1] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
template matching; Gaussian Mixture Model; log likelihood ratio; lattice rescoring; DTW;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel approach of integrating template matching with statistical modeling to improve continuous speech recognition. We use multiple Gaussian Mixture Model (GMM) indices to represent each frame of speech templates, use hierarchical agglomerative clustering to generate template representatives, and use log likelihood ratio as the local distance measure for DTW template matching in lattice rescoring. Experimental results on the TIMIT phone recognition task demonstrated that the proposed approach consistently improved several HMM baselines significantly, where the absolute accuracy gain was 1.69%similar to 1.83% if all training templates were used, and the gain was 1.29%similar to 1.37% if template representatives were used.
引用
收藏
页码:74 / 77
页数:4
相关论文
共 12 条
  • [1] Aradilla G., 2005, Proc. Eurospeech, P3333
  • [2] Axelrod S, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P173
  • [3] De Wachter M, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P181
  • [4] De Wachter M., 2007, IEEE T ASLP, V15
  • [5] DENG L, 2005, P IEEE WORKSH ASRU
  • [6] Gish H, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P466, DOI 10.1109/ICSLP.1996.607155
  • [7] A probabilistic framework for segment-based speech recognition
    Glass, JR
    [J]. COMPUTER SPEECH AND LANGUAGE, 2003, 17 (2-3) : 137 - 152
  • [8] Maier V., 2005, Proc. Interspeech, P1245
  • [9] Ming J, 2009, INT CONF ACOUST SPEE, P3849, DOI 10.1109/ICASSP.2009.4960467
  • [10] From HMM's to segment models: A unified view of stochastic modeling for speech recognition
    Ostendorf, M
    Digalakis, VV
    Kimball, OA
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (05): : 360 - 378