Deterministically annealed design of hidden Markov model speech recognizers

被引:14
作者
Rao, AV [1 ]
Rose, K [1 ]
机构
[1] Microsoft Corp, Santa Barbara, CA USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 02期
基金
美国国家科学基金会;
关键词
deterministic annealing; discriminative training; hidden Markov model; isolated word recognition; minimum classification error;
D O I
10.1109/89.902278
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many conventional speech recognition systems are based on the use of hidden Markov models (HMM) within the context of discriminant-based pattern classification. While the speech recognition objective is a low rate of misclassification, HMM design has been traditionally approached via maximum likelihood (ML) modeling which is, in general, mismatched with the minimum error objective and hence suboptimal. Direct minimization of the error rate is difficult because of the complex nature of the cost surface, and has only been addressed recently by discriminative design methods such as generalized probabilistic descent (GPD). While existing discriminative methods offer significant benefits, they commonly rely on local optimization via gradient descent whose performance suffers from the prevalence of shallow local minima. As an alternative, we propose the deterministic annealing (DA) design method that directly minimizes the error rate while avoiding many poor local minima of the cost. DA is derived from fundamental principles of statistical physics and information theory. In DA, the HMM classifier's decision is randomized and its expected error rate is minimized subject to a constraint on the level of randomness which is measured by the Shannon entropy. The entropy constraint is gradually relaxed, leading in the limit of zero entropy to the design of regular nonrandom HMM classifiers. An efficient forward-backward algorithm is proposed for the DA method. Experiments on synthetic data and on a simplified recognizer for isolated English letters demonstrate that the DA design method can improve recognition error rates over both ML and GPD methods.
引用
收藏
页码:111 / 126
页数:16
相关论文
共 39 条
[1]  
[Anonymous], HIDDEN MARKOV MODELS
[2]  
BAHL L, 1986, P INT C AC SPEECH SI, V1, P49, DOI DOI 10.1109/ICASSP.1986.1169179>
[3]  
BAHL LR, 1988, P ICASSP 88 NEW YORK, P493
[4]  
BAKER JK, 1975, SPEECH RECOGNITION, P521
[5]   STATISTICAL INFERENCE FOR PROBABILISTIC FUNCTIONS OF FINITE STATE MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (06) :1554-&
[6]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[7]  
BOURLARD H, SIGNAL PROCESSING 8, V1, P101
[8]   Discriminative Training of Dynamic Programming Based Speech Recognizers [J].
Chang, Pao-Chung ;
Juang, Biing-Hwang .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02) :135-143
[9]  
CHANG PC, 1992, P ICASSP 92 SAN FRAN, V1, P493
[10]  
CHOU W, 1992, P INT C AC SPEECH SI, V1, P473