N-channel hidden Markov models for combined stressed speech classification and recognition

被引:55
作者
Womack, BD [1 ]
Hansen, JHL [1 ]
机构
[1] Univ Colorado, Robust Speech Proc Lab, Ctr Spoken Language Understanding, Boulder, CO 80309 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1999年 / 7卷 / 06期
关键词
Lombard effect; N-channel Markov model; speech recognition; stress classification;
D O I
10.1109/89.799692
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Robust speech recognition systems must address variations due to perceptually induced stress in order to maintain acceptable levels of performance in adverse conditions. One approach for addressing these variations is to utilize front end stress classification to direct a stress dependent recognition algorithm which separately models each speech production domain. This study proposes a new approach which combines stress classification and speech recognition functions into one algorithm. This is accomplished by generalizing the one-dimensional (1-D) hidden Markov model to an N-channel hidden Markov model (N-channel HMM). Here, each stressed speech production style under consideration is allocated a dimension in the N-Channel HMM to model each perceptually induced stress condition. It is shown that this formulation better integrates perceptually induced stress effects for stress independent recognition. This is due to the sub-phoneme (state level) stress classification that is implicitly performed by the algorithm. The proposed N-channel stress independent HMM method is compared to a previously established one-channel stress dependent isolated word recognition system yielding a 73.8% reduction in error rate. In addition, an 82.7% reduction in error rate is observed compared to the common one-channel neutral trained recognition approach.
引用
收藏
页码:668 / 677
页数:10
相关论文
共 32 条
[1]   Selective training for hidden Markov models with applications to speech classification [J].
Arslan, LM ;
Hansen, JHL .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01) :46-54
[2]  
ARSLAN LM, 1996, P ICASSP, V2, P589
[3]   HMM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress [J].
Bou-Ghazale, SE ;
Hansen, JHL .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03) :201-216
[4]   NONLINEAR-ANALYSIS AND CLASSIFICATION OF SPEECH UNDER STRESSED CONDITIONS [J].
CAIRNS, DA ;
HANSEN, JHL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 96 (06) :3392-3400
[5]  
Deller Jr J. R., 1993, DISCRETE TIME PROCES
[6]   SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES [J].
DIGALAKIS, VV ;
RTISCHEV, D ;
NEUMEYER, LG .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :357-366
[7]  
Hansen J. H. L., 1994, ICSLP 94. 1994 International Conference on Spoken Language Processing, P1003
[8]   ROBUST SPEECH RECOGNITION TRAINING VIA DURATION AND SPECTRAL-BASED STRESS TOKEN GENERATION [J].
HANSEN, JHL ;
BOUGHAZALE, SE .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :415-421
[9]   Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition [J].
Hansen, JHL .
SPEECH COMMUNICATION, 1996, 20 (1-2) :151-173
[10]   SOURCE GENERATOR EQUALIZATION AND ENHANCEMENT OF SPECTRAL PROPERTIES FOR ROBUST SPEECH RECOGNITION IN NOISE AND STRESS [J].
HANSEN, JHL ;
CLEMENTS, MA .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :407-415