HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

被引:37
作者
Chengalvarayan, R [1 ]
Deng, L [1 ]
机构
[1] UNIV WATERLOO, DEPT ELECT & COMP ENGN, WATERLOO, ON N2L 3G1, CANADA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1997年 / 5卷 / 03期
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/89.568731
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the study reported in this paper, we investigate interactions of front-end feature extraction and back-end classification techniques in hidden Markov model-based (HMM-based) speech recognition, The proposed model focuses on dimensionality reduction of the mel-warped discrete fourier transform (DFT) feature space subject to maximal preservation of speech classification information, and aims at finding an optimal linear transformation on the mel-warped DFT according to the minimum classification error (MCE) criterion, This linear transformation, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error counts, A further generalization of the model allows integration of the discriminatively derived state-dependent transformation with the construction of dynamic feature parameters, Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC's), An error rate reduction of 15% is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during training.
引用
收藏
页码:243 / 256
页数:14
相关论文
共 27 条
[1]  
BIEM A, 1994, INT CONF ACOUST SPEE, P485
[2]  
BOCCHIERI E, 1993, COMPUT SPEECH LANG, P229
[3]   FRAME-SPECIFIC STATISTICAL FEATURES FOR SPEAKER INDEPENDENT SPEECH RECOGNITION [J].
BOCCHIERI, EL ;
DODDINGTON, GR .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :755-764
[4]  
CHENGALVARAYAN R, 1995, P ICASSP, P373
[5]  
CHOU W, INT J PATTERN RECOGN, V8, P5
[6]  
Chou W., 1992, P IEEE ICASSP 92, P473
[7]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[8]   PHONEMIC HIDDEN MARKOV-MODELS WITH CONTINUOUS MIXTURE OUTPUT DENSITIES FOR LARGE VOCABULARY WORD RECOGNITION [J].
DENG, L ;
KENNY, P ;
LENNIG, M ;
GUPTA, V ;
SEITZ, F ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (07) :1677-1681
[9]  
DENG L, IEEE SIGNAL PROCESSI, V1, P66
[10]  
EULER S, 1995, P EUROSPEECH, V1, P109