A GENERALIZED HIDDEN MARKOV MODEL WITH STATE-CONDITIONED TREND FUNCTIONS OF TIME FOR THE SPEECH SIGNAL

被引:64
作者
DENG, L
机构
[1] Department of Electrical and Computer Engineering, University of Waterloo, Waterloo
基金
加拿大自然科学与工程研究理事会;
关键词
SPEECH SIGNAL; ACOUSTIC TRANSITION; HIDDEN MARKOV MODEL; STATE-DEPENDENT NON-STATIONARITY; TREND FUNCTION; TIME SERIES; EM ALGORITHM;
D O I
10.1016/0165-1684(92)90112-A
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The standard hidden Markov model (HMM) and the hidden filter model assume local or state-conditioned stationarity for the modeled signal. In this work we generalize these models and develop the 'trended HMM' to allow the local, as well as the global (via a Markov chain), non-stationarity to be represented in the model. The mathematical structure of the trended HMM can be described by a discrete-time Markov process with its states associated with distinct regression functions on time, or alternatively by a 'deterministic trend plus stationary residual' time series with its parameters governed by the evolution of a Markov chain. The EM algorithm is applied to obtain closed-form re-estimation formulas for the model parameters. Compared with the types of HMMs developed in the past, the trended HMM is a more faithful and more structured representation of many classes of speech sounds whose production involves strong articulatory dynamics. As such, it is expected to be a more suitable model for use in speech processing applications.
引用
收藏
页码:65 / 78
页数:14
相关论文
共 24 条
  • [1] Baum, An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequalities, 3, pp. 1-8, (1972)
  • [2] Box, Jenkins, Time Series Analysis - Forecasting and Control, pp. 67-72, (1976)
  • [3] Davis, Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process., 28, 4, pp. 357-365, (1980)
  • [4] Dempster, Laird, Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc., 39, pp. 1-38, (1977)
  • [5] Deng, Erler, Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., Microstructural speech units and their HMM representation for discrete utterance speech recognition, pp. 193-196, (1991)
  • [6] Deng, Geisler, Greenberg, A composite model of the auditory periphery for the processing of speech, J. Phonetics, 16, pp. 93-108, (1988)
  • [7] Deng, Gupta, Lennig, Kenny, Mermelstein, Acoustic recognition component of an 86000-word speech recognizer, Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., pp. 741-744, (1990)
  • [8] Deng, Kenny, Lennig, Gupta, Mermelstein, A locus model of coarticulation in an HMM speech recognizer, Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., 1, pp. 97-100, (1989)
  • [9] Deng, Kenny, Lennig, Gupta, Seitz, Mermelstein, Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition, IEEE Trans. Acoust. Speech Signal Process., 39, 7, pp. 1677-1681, (1991)
  • [10] Deng, Kenny, Lennig, Mermelstein, Modeling acoustic transitions in speech by state-interpolation hidden Markov models, IEEE Transactions on Signal Processing, 42, 2, pp. 265-271, (1992)