Predictor-Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale

被引:3
作者
Watanabe, Shinji [1 ]
Nakamura, Atsushi [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 02期
关键词
Acoustic model; incremental adaptation; macroscopic time evolution; predictor-corrector algorithm; speech recognition; HIDDEN MARKOV-MODELS; SPEAKER ADAPTATION; LINEAR-REGRESSION; TRANSFORMATION;
D O I
10.1109/TASL.2009.2029717
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Incremental adaptation techniques for speech recognition are aimed at adjusting acoustic models to time-variant acoustic characteristics related to such factors as changes of speaker, speaking style, and noise source over time. In this paper, we propose a novel incremental adaptation framework, which models such time-variant characteristics by successively updating posterior distributions of acoustic model parameters based on a macroscopic time scale (e.g., every set of more than a dozen utterances). The proposed incremental update involves a predictor-corrector algorithm based on a macroscopic time evolution system in accordance with the Kalman filter theory. We also provide a unified interpretation of the proposal and the two major conventional approaches of indirect adaptation via transformation parameters [e.g., maximum-likelihood linear regression (MLLR)] and direct adaptation of classifier parameters [e.g., maximum a posteriori (MAP)]. We reveal analytically and experimentally that the proposed incremental adaptation realizes the predictor-corrector algorithm and involves both the conventional and their combinatorial adaptation approaches. Consequently, the proposal achieves robust recognition performance based on a balanced incremental adaptation between quickness and stability.
引用
收藏
页码:395 / 407
页数:13
相关论文
共 36 条
[1]  
[Anonymous], 1995, INTRO KALMAN FILTER
[2]  
BRIDLE J, 1998, P CLSP JHU SUMM WORK
[3]  
Chesta C., 1999, EUROSPEECH, P211
[4]   Quasi-Bayes linear regression for sequential learning of hidden Markov models [J].
Chien, JT .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (05) :268-278
[5]   Online hierarchical transformation of hidden Markov models for speech recognition [J].
Chien, JT .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (06) :656-667
[6]  
Chou W., 1999, P EUR, P1
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   Speaker adaptation using combined transformation and Bayesian methods [J].
Digalakis, VV ;
Neumeyer, LG .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04) :294-300
[9]   SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES [J].
DIGALAKIS, VV ;
RTISCHEV, D ;
NEUMEYER, LG .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :357-366
[10]   Online adaptation of hidden Markov models using incremental estimation algorithms [J].
Digalakis, VV .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :253-261