THE IMPORTANCE OF CEPSTRAL PARAMETER CORRELATIONS IN SPEECH RECOGNITION

被引:24
作者
LJOLJE, A
机构
[1] AT and T Bell Laboratories, Murray Hill, NJ 07974
关键词
Cepstral parameter correlations - Single multivariate Gaussian distribution - Three state left to right phone models;
D O I
10.1006/csla.1994.1011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we demonstrate that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy. Most large-vocabulary speech recognition systems are based on some form of hidden Markov models (HMMs) modeling sub-word speech segments. Most of the time speech segments are represented using short term spectra. In this work we employ three-state left-to-right phone models and LPC cepstral parameters including their first and second order time differentials. We investigate the importance of modeling correlations between cepstral parameters for high accuracy phone recognition. Several different types of distributions for each HMM state are compared. The simplest uses a single multivariate Gaussian distribution with a full covariance matrix. The next uses a weighted mixture of multivariate Gaussian distributions with diagonal covariances. It uses implicit rather than explicit modeling of parameter correlations. The most elaborate model employs a mixture of Gaussian distributions, just like the previous model, but in addition it uses a parameter space rotation which is specific to a given state in an HMM. It thus explicitly models parameter correlations in exactly the same way as the simplest model which uses a single distribution per state. The highest phone accuracy on the DARPA Resource Management task Feb 89 test set is obtained using the most elaborate model, with mixtures and space rotation - 82.4% phone accuracy. The next best result was achieved using single distributions, which also explicitly model parameter correlations, with 80.8% phone accuracy. The worst result was obtained using distributions which only implicitly model parameter correlations, achieving 78.7% phone accuracy. These results clearly demonstrate the importance of explicitly modeling parameter correlations for improving speech recognition performance.
引用
收藏
页码:223 / 232
页数:10
相关论文
共 13 条
[1]  
JUANG BH, 1985, IEEE INT S INFORMATI, P42
[2]  
LAMEL LF, 1987, P DARPA SPEECH REC W, P26
[3]  
Lee C. H., 1990, Computer Speech and Language, V4, P127, DOI 10.1016/0885-2308(90)90002-N
[4]  
Lee K. F., 1989, AUTOMATIC SPEECH REC
[5]   CONTINUOUSLY VARIABLE DURATION HIDDEN MARKOV MODELS FOR AUTOMATIC SPEECH RECOGNITION. [J].
Levinson, S.E. .
Computer Speech and Language, 1986, 1 (01) :29-45
[6]   HIGH-ACCURACY PHONE RECOGNITION USING CONTEXT CLUSTERING AND QUASI-TRIPHONIC MODELS [J].
LJOLJE, A .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02) :129-151
[7]  
LJOLJE A, 1992, 1992 P INT C SPOK LA
[8]  
LJOLJE A, 1991, P INT C AC SPEECH SI, P473
[9]  
PAUL DB, 1990, P DARPA SPEECH NATUR
[10]  
PRICE P, 1988, P IEEE INT C AC SPEE, P651