Predictor-Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale

被引：3

作者：

Watanabe, Shinji ^{[1
]}

Nakamura, Atsushi ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 02期

关键词：

Acoustic model; incremental adaptation; macroscopic time evolution; predictor-corrector algorithm; speech recognition; HIDDEN MARKOV-MODELS; SPEAKER ADAPTATION; LINEAR-REGRESSION; TRANSFORMATION;

D O I：

10.1109/TASL.2009.2029717

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Incremental adaptation techniques for speech recognition are aimed at adjusting acoustic models to time-variant acoustic characteristics related to such factors as changes of speaker, speaking style, and noise source over time. In this paper, we propose a novel incremental adaptation framework, which models such time-variant characteristics by successively updating posterior distributions of acoustic model parameters based on a macroscopic time scale (e.g., every set of more than a dozen utterances). The proposed incremental update involves a predictor-corrector algorithm based on a macroscopic time evolution system in accordance with the Kalman filter theory. We also provide a unified interpretation of the proposal and the two major conventional approaches of indirect adaptation via transformation parameters [e.g., maximum-likelihood linear regression (MLLR)] and direct adaptation of classifier parameters [e.g., maximum a posteriori (MAP)]. We reveal analytically and experimentally that the proposed incremental adaptation realizes the predictor-corrector algorithm and involves both the conventional and their combinatorial adaptation approaches. Consequently, the proposal achieves robust recognition performance based on a balanced incremental adaptation between quickness and stability.

引用

页码：395 / 407

页数：13

共 36 条

[1]

[Anonymous], 1995, INTRO KALMAN FILTER

[2]

BRIDLE J, 1998, P CLSP JHU SUMM WORK

[3]

Chesta C., 1999, EUROSPEECH, P211

[4] Quasi-Bayes linear regression for sequential learning of hidden Markov models [J].

Chien, JT .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (05) :268-278

[5] Online hierarchical transformation of hidden Markov models for speech recognition [J].

Chien, JT .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (06) :656-667

[6]

Chou W., 1999, P EUR, P1

[7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[8] Speaker adaptation using combined transformation and Bayesian methods [J].

Digalakis, VV ;

Neumeyer, LG .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04) :294-300

[9] SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES [J].

DIGALAKIS, VV ;

RTISCHEV, D ;

NEUMEYER, LG .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :357-366

[10] Online adaptation of hidden Markov models using incremental estimation algorithms [J].

Digalakis, VV .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :253-261

← 1 2 3 4 →