Efficient training algorithms for HMM's using incremental estimation

被引：18

作者：

Gotoh, Y ^{[1
]}

Hochberg, MM ^{[1
]}

Silverman, HF ^{[1
]}

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1998年 / 6卷 / 06期

基金：

美国国家科学基金会;

关键词：

HMM training algorithm; incremental estimation; MAP estimation;

D O I：

10.1109/89.725320

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Typically, parameter estimation for a hidden Markov model (HMM) is performed using an expectation-maximization (EM) algorithm with the maximum-likelihood (ML) criterion. The EM algorithm is an iterative scheme that is well-defined and numerically stable, but convergence may require a large number of iterations. For speech recognition systems utilizing large amounts of training material, this results in long training times. This paper presents an incremental estimation approach to speed-up the training of HMM's without any loss of recognition performance, The algorithm selects a subset of data from the training set, updates the model parameters based on the subset, and then iterates the process until convergence of the parameters. The advantage of this approach is a substantial increase in the number of iterations of the EM algorithm per training token, which leads to faster training, In order to achieve reliable estimation from a small fraction of the complete data set at each iteration, two training criteria are studied; ML and maximum a posteriori (MAP) estimation. Experimental results show that the training of the incremental algorithms is substantially faster than the conventional (batch) method and suffers no loss of recognition performance. Furthermore, the incremental MAP based training algorithm improves performance over the batch version.

引用

页码：539 / 548

页数：10

共 20 条

[1] SMOOTH ONLINE LEARNING ALGORITHMS FOR HIDDEN MARKOV-MODELS [J].

BALDI, P ;

CHAUVIN, Y .

NEURAL COMPUTATION, 1994, 6 (02) :307-318

[2]

BALDI P, 1993, ADV NEURAL INFORMATI, V5

[3] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].

BAUM, LE ;

PETRIE, T ;

SOULES, G ;

WEISS, N .

ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&

[4]

Berger JO., 1985, Statistical Decision Theory and Bayesian Analysis, V2, DOI DOI 10.1007/978-1-4757-4286-2

[5]

Box GE., 2011, BAYESIAN INFERENCE S

[6]

DeGroot M., 1970, OPTIMAL STAT DECISIO

[7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[8]

Duda R.O., 1973, Pattern classification

[9] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[10]

GOTOH Y, UNPUB MAP ESTIMATION

← 1 2 →