Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

被引:13
作者
Cai, Jun [1 ,2 ,3 ]
Bouselmi, Ghazi [1 ,2 ]
Laprie, Yves [1 ,2 ]
Haton, Jean-Paul [1 ,2 ]
机构
[1] LORIA, CNRS, Grp Parole, F-54600 Vandoeuvre Les Nancy, France
[2] INRIA, F-54600 Vandoeuvre Les Nancy, France
[3] Xiamen Univ, Dept Cognit Sci, Xiamen 361005, Peoples R China
关键词
Gaussian selection; Fast likelihood computation; Hidden Markov models; Speech recognition;
D O I
10.1016/j.csl.2008.05.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that call substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and call thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is all extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJ0 corpora indicate that this approach call speed tip the likelihood computation significantly without introducing apparent additional recognition error. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:147 / 164
页数:18
相关论文
共 31 条
[1]   AN IMPROVEMENT OF THE MINIMUM DISTORTION ENCODING ALGORITHM FOR VECTOR QUANTIZATION [J].
BEI, CD ;
GRAY, RM .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1985, 33 (10) :1132-1133
[2]   Subspace distribution clustering hidden Markov model [J].
Bocchieri, E ;
Mak, BKW .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :264-275
[3]  
Bocchieri E., 1993, P ICASSP, VII, P692
[4]  
Chan Arthur., 2004, P INTERSPEECH 2004, P689
[5]   Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers [J].
Digalakis, VV ;
Monaco, P ;
Murveit, H .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04) :281-289
[6]  
Fritsch J, 1996, INT CONF ACOUST SPEE, P837, DOI 10.1109/ICASSP.1996.543251
[7]   State-based Gaussian selection in large vocabulary continuous speech recognition using HMM's [J].
Gales, MJF ;
Knill, KM ;
Young, SJ .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (02) :152-161
[8]  
HUANG X, 1990, P ICASSP 90, V2, P689
[9]  
HWANG M, 1991, P EUR 91, P785
[10]   Shared-Distribution Hidden Markov Models for Speech Recognition [J].
Hwang, Mei-Yuh ;
Huang, Xuedong .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04) :414-420