Speech recognition using hybrid hidden Markov model and NN classifier

被引:1
作者
Kundu A. [1 ]
Bayya A. [2 ]
机构
[1] U.S. West Advanced Technologies, Boulder
[2] Rockwell International Corporation, 4311 Jamboree Rd., Newport Beach
关键词
Baum-welch (BW) algorithm; Hidden markov model; Hybrid classifier; Modified viterbi algorithm (MVA); Multilayer perceptrons; Neural nets; Segmental K-means algorithm;
D O I
10.1007/BF02111210
中图分类号
学科分类号
摘要
This paper discusses the use of an integrated HMM/NN classifier for speech recognition. The proposed classifier combines the time normalization property of the HMM classifier with the superior discriminative ability of the neural net (NN) classifier. Speech signals display a strong time varying characteristic. Although the neural net has been successful in many classification problems, its success (compared to HMM) is secondary to HMM in the field of speech recognition. The main reason is the lack of time normalization characteristics of most neural net structures (time-delay neural net is one notable exception but its structure is very complex). In the proposed integrated hybrid HMM/NN classifier, a left-to-right HMM module is used first to segment the observation sequence of every exemplar into a fixed number of states. Subsequently, all the frames belonging to the same state are replaced by one average frame. Thus, every exemplar, irrespective of its time scale variation, is transformed into a fixed number of frames, i.e., a static pattern. The multilayer perceptron (MLP) neural net is then used as the classifier for these time normalized exemplars. Some experimental results using telephone speech databases are presented to demonstrate the potential of this hybrid integrated classifier. © 1998 Kluwer Academic Publishers.
引用
收藏
页码:227 / 240
页数:13
相关论文
共 22 条
[1]  
Bahl, L.R., Brown, P.F., De Souza, P.V., Mercer, R.L., Maximum mutual information estimation of hidden Markov model parameters for speech recognition (1986) Proc. of ICASSP. Tokyo, pp. 49-52
[2]  
Barnard, E., Cole, R.A., Fanty, M., Vermeulen, P., Real-world speech recognition with neural networks (1995) Applications of Neural Networks to Telecommunications (IWANNT'95), 2, pp. 186-193. , In R.J. Alspector and T.X. Brown (Eds.), Hillsdale, NJ: Lawrence Erlbaum Assoc
[3]  
Bengio, Y., Mori, R.D., Flammia, G., Kompe, R., Global optimization of a neural network-HMM hybrid (1992) IEEE Trans. on Neural Net, 3 (2), pp. 252-259
[4]  
Bourlard, H., Wellekens, C.J., Links between Markov modles and multilayer perceptrons (1990) IEEE Trans. on PAMI, 12 (12), pp. 1167-1178
[5]  
Chen, M.Y., Kundu, A., Multi-level HMM for handwritten word recognition (1995) Proc. of ICASSP', 95, pp. 2623-2626. , Detroit
[6]  
Dayhoff, J.E., (1990) Neural Network Architecture: an Introduction., 4, p. 58. , New York: Van Nostrand Reinhold, chap
[7]  
Ephraim, Y., Rabiner, L.R., On the relations between modeling approaches for information sources (1988) Proc. of ICASSP., 1, pp. 24-27. , New York
[8]  
Fanty, M., Barnard, E., Cole, R., Alphabet recognition (1996) Handbook of Neural Computation.
[9]  
Forney Jr., G.D., (1973) The Viterbi Algorithm. IEEE Proc., 61 (3), pp. 263-278
[10]  
He, Y., Kundu, A., 2D shape classification using HMM (1991) IEEE Trans. Pattern Anal. Machine Intell., 13 (11), pp. 1172-1184