Multonic Markov Word Models for Large Vocabulary Continuous Speech Recognition

被引:2
作者
Bahl, Lalit R. [1 ]
Bellegarda, Jerome R. [1 ]
de Souza, Peter V. [2 ]
Gopalakrishnan, P. S. [1 ]
Nahamoo, David [1 ]
Picheny, Michael A. [1 ]
机构
[1] IBM Res Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM Res Corp, TJ Watson Res Ctr, Cupertino, CA 95014 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1993年 / 1卷 / 03期
关键词
D O I
10.1109/89.232617
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new class of hidden Markov models is proposed for the acoustic representation of words in an automatic speech recognition system. The models, built from combinations of acoustically based sub-word units called fenones, are derived automatically from one or more sample utterances of a word. Because they are more flexible than previously reported fenone-based word models, they lead to an improved capability of modeling variations in pronunciation. They are therefore particularly useful in the recognition of continuous speech. In addition, their construction is relatively simple, because it can be done using the well-known forward-backward algorithm for parameter estimation of hidden Markov models. Appropriate reestimation formulas are derived for this purpose. Experimental results obtained on a 5000-word vocabulary natural language continuous speech recognition task are presented to illustrate the enhanced power of discrimination of the new models.
引用
收藏
页码:334 / 344
页数:11
相关论文
共 21 条
[1]  
Bahl L. R., 1988, P ICASSP 88 NEW YORK, P40
[2]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[3]  
BAHL LR, 1989, P ICASSP 89 GLASG SC, P465
[4]  
BAHL LR, 1988, P INT C ACOUST SPEEC, P489
[5]  
BAHL LR, 1991, P 1991 INT C AC SPEE
[6]  
BAHL LR, IEEE T SIGN IN PRESS
[7]  
BAHL LR, 1987, P 1987 INT S SIGN P, P565
[8]  
BAHL LR, 1988, P 1988 INT C AC SPEE, P497
[9]  
Baum L. E., 1972, INEQUALITIES, V3, P1
[10]   TIED MIXTURE CONTINUOUS PARAMETER MODELING FOR SPEECH RECOGNITION [J].
BELLEGARDA, JR ;
NAHAMOO, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (12) :2033-2045