Maximum entropy direct models for speech recognition

被引:27
作者
Kuo, HKJ [1 ]
Gao, YQ [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 03期
关键词
direct modeling; maximum entropy acoustic modeling; nongenerative modeling;
D O I
10.1109/TSA.2005.858064
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Traditional statistical models for speech recognition have mostly been based on a Bayesian framework using generative models such as hidden Markov models (IIMMs). This paper focuses on a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be asynchronous and overlapping. This model therefore allows for the potential combination of many different types of features, which need not be statistically independent of each other. In this paper, a specific kind of direct model, the maximum entropy Markov model (MEMM), is studied. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Preliminary results combining the MEMM scores with HMM and language model scores show modest improvements over the best HMM speech recognizer.
引用
收藏
页码:873 / 881
页数:9
相关论文
共 21 条
[1]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[2]   The psychology of reactions to environmental agents [J].
Berglund, B ;
Job, RFS .
ENVIRONMENT INTERNATIONAL, 1996, 22 (01) :1-1
[3]  
Collins M, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P1
[4]   GENERALIZED ITERATIVE SCALING FOR LOG-LINEAR MODELS [J].
DARROCH, JN ;
RATCLIFF, D .
ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (05) :1470-&
[5]   Inducing features of random fields [J].
DellaPietra, S ;
DellaPietra, V ;
Lafferty, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393
[6]   Using semantic analysis to improve speech recognition performance [J].
Erdogan, H ;
Sarikaya, R ;
Chen, SF ;
Gao, YQ ;
Picheny, M .
COMPUTER SPEECH AND LANGUAGE, 2005, 19 (03) :321-343
[7]  
ERDOGAN H, 2002, P ICSLP DENV CO SEP, P933
[8]  
GAO Y, 2000, P ICSLP BEIJ CHIN OC, V4, P125
[9]  
KUO HKJ, 2003, P ASRU ST THOM US VI, P1
[10]  
KUO HKJ, 2004, P ICSLP JEJ ISL S KO, P681