Bayesian learning of speech duration models

被引：15

作者：

Chien, JT ^{[1
]}

Huang, CH ^{[1
]}

机构：

[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2003年 / 11卷 / 06期

关键词：

adaptive duration model; conjugate prior; gamma distribution; quasi-Bayes estimate; sequential learning; speaking rate; speech recognition;

D O I：

10.1109/TSA.2003.818114

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents the Bayesian speech duration modeling and learning for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson, and gamma distributions are investigated to characterize the duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced with twofold advantages. One is to determine the optimal QB duration parameter, which can be merged in HMMs for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill QB parameter estimation. The adaptation of overall HMM parameters can be performed simultaneously. In the experiments, the proposed adaptive duration model improves the speech recognition performance of Mandarin broadcast news and noisy connected digits. The batch and sequential learning are respectively investigated for MAP and QB duration models.

引用

页码：558 / 567

页数：10

共 36 条

[1]

ANASTASAKOS A, 1995, P IEEE INT C AC SPEE, P628

[2]

[Anonymous], 1980, Proc. Symposium on the application of hidden Markov models to text and speech

[3]

[Anonymous], 1965, HDB MATH FUNCTIONS

[4] Robust parametric modeling of durations in hidden Markov models [J].

Burshtein, D .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (03) :240-242

[5] Transformation-based Bayesian predictive classification using online prior evolution [J].

Chien, JT ;

Liao, GH .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04) :399-410

[6] Online hierarchical transformation of hidden Markov models for speech recognition [J].

Chien, JT .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (06) :656-667

[7]

DeGroot M., 1970, OPTIMAL STAT DECISIO

[8] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[9]

DONG R, 2002, P INT CON SPOK LANG, P385

[10]

FABIAN T, 2001, P EUROSPEECH, V4, P2535

← 1 2 3 4 →