A GENERALIZED HIDDEN MARKOV MODEL WITH STATE-CONDITIONED TREND FUNCTIONS OF TIME FOR THE SPEECH SIGNAL

被引：64

作者：

DENG, L

机构：

[1] Department of Electrical and Computer Engineering, University of Waterloo, Waterloo

来源：

SIGNAL PROCESSING | 1992年 / 27卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

SPEECH SIGNAL; ACOUSTIC TRANSITION; HIDDEN MARKOV MODEL; STATE-DEPENDENT NON-STATIONARITY; TREND FUNCTION; TIME SERIES; EM ALGORITHM;

D O I：

10.1016/0165-1684(92)90112-A

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The standard hidden Markov model (HMM) and the hidden filter model assume local or state-conditioned stationarity for the modeled signal. In this work we generalize these models and develop the 'trended HMM' to allow the local, as well as the global (via a Markov chain), non-stationarity to be represented in the model. The mathematical structure of the trended HMM can be described by a discrete-time Markov process with its states associated with distinct regression functions on time, or alternatively by a 'deterministic trend plus stationary residual' time series with its parameters governed by the evolution of a Markov chain. The EM algorithm is applied to obtain closed-form re-estimation formulas for the model parameters. Compared with the types of HMMs developed in the past, the trended HMM is a more faithful and more structured representation of many classes of speech sounds whose production involves strong articulatory dynamics. As such, it is expected to be a more suitable model for use in speech processing applications.

引用

页码：65 / 78

页数：14

共 24 条

[1]

Baum, An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequalities, 3, pp. 1-8, (1972)

[2]

Box, Jenkins, Time Series Analysis - Forecasting and Control, pp. 67-72, (1976)

[3]

Davis, Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process., 28, 4, pp. 357-365, (1980)

[4]

Dempster, Laird, Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc., 39, pp. 1-38, (1977)

[5]

Deng, Erler, Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., Microstructural speech units and their HMM representation for discrete utterance speech recognition, pp. 193-196, (1991)

[6]

Deng, Geisler, Greenberg, A composite model of the auditory periphery for the processing of speech, J. Phonetics, 16, pp. 93-108, (1988)

[7]

Deng, Gupta, Lennig, Kenny, Mermelstein, Acoustic recognition component of an 86000-word speech recognizer, Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., pp. 741-744, (1990)

[8]

Deng, Kenny, Lennig, Gupta, Mermelstein, A locus model of coarticulation in an HMM speech recognizer, Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., 1, pp. 97-100, (1989)

[9]

Deng, Kenny, Lennig, Gupta, Seitz, Mermelstein, Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition, IEEE Trans. Acoust. Speech Signal Process., 39, 7, pp. 1677-1681, (1991)

[10]

Deng, Kenny, Lennig, Mermelstein, Modeling acoustic transitions in speech by state-interpolation hidden Markov models, IEEE Transactions on Signal Processing, 42, 2, pp. 265-271, (1992)

← 1 2 3 →