Structural maximum a posteriori linear regression for fast HMM adaptation

被引:59
作者
Siohan, O [1 ]
Myrvoll, TA [1 ]
Lee, CH [1 ]
机构
[1] Bell Labs, Lucent Technol, Multimedia Commun Res Lab, Murray Hill, NJ 07974 USA
关键词
D O I
10.1006/csla.2001.0181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformation-based model adaptation techniques have been used for many years to improve robustness of speech recognition systems. While the estimation criterion used to estimate transformation parameters has been mainly based on maximum likelihood estimation (MLE), Bayesian versions of some of the most popular transformation-based adaptation methods have been recently introduced, like MAPLR, a maximum a posteriori (MAP) based version of the well-known maximum likelihood linear regression (MLLR) algorithm. This is in fact an attempt to constraint parameter estimation in order to obtain reliable estimation with a limited amount of data, not only to prevent overfitting the adaptation data but also to allow integration of prior knowledge into transformation-based adaptation techniques. Since such techniques require the estimation of a large number of transformation parameters when the amount of adaptation data is large, it is also required to define a large number of prior densities for these parameters. Robust estimation of these prior densities is therefore a crucial issue that directly affects the efficiency and effectiveness of the Bayesian techniques. This paper proposes to estimate these priors using the notion of hierarchical priors, embedded into the tree structure used to control transformation complexity. The proposed algorithm, called structural MAPLR (SMAPLR), has been evaluated on the Spoke3 1993 test set of the WSJ task. It is shown that SMAPLR reduces the risk of overtraining and exploits the adaptation data much more efficiently than MLLR, leading to a significant reduction of the word error rate for any amount of adaptation data. (C) 2002 Academic Press.
引用
收藏
页码:5 / 24
页数:20
相关论文
共 38 条
[21]   MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS [J].
LEGGETTER, CJ ;
WOODLAND, PC .
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) :171-185
[22]  
LEGGETTER CJ, 1995, P EUROSPEECH, V2, P1155
[23]  
LEGGETTER CJ, 1991, FINFENGTR181 CUED
[24]   ALGORITHM FOR VECTOR QUANTIZER DESIGN [J].
LINDE, Y ;
BUZO, A ;
GRAY, RM .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1980, 28 (01) :84-95
[25]  
Myrvoll T., 2000, P INT C SPOK LANG PR
[26]  
MYRVOLL TA, 2001, P EUR C SPEECH COMM
[27]   Robust decision tree state tying for continuous speech recognition [J].
Reichl, W ;
Chou, W .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (05) :555-566
[28]   Maximum-likelihood approach to stochastic matching for robust speech recognition [J].
Sankar, A ;
Lee, CH .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (03) :190-202
[29]   A structural Bayes approach to speaker adaptation [J].
Shinoda, K ;
Lee, CH .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :276-287
[30]  
SHINODA K, 1997, P IEEE WORKSH SPEECH