Maximum entropy direct models for speech recognition

被引：27

作者：

Kuo, HKJ ^{[1
]}

Gao, YQ ^{[1
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 03期

关键词：

direct modeling; maximum entropy acoustic modeling; nongenerative modeling;

D O I：

10.1109/TSA.2005.858064

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Traditional statistical models for speech recognition have mostly been based on a Bayesian framework using generative models such as hidden Markov models (IIMMs). This paper focuses on a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be asynchronous and overlapping. This model therefore allows for the potential combination of many different types of features, which need not be statistically independent of each other. In this paper, a specific kind of direct model, the maximum entropy Markov model (MEMM), is studied. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Preliminary results combining the MEMM scores with HMM and language model scores show modest improvements over the best HMM speech recognizer.

引用

页码：873 / 881

页数：9

共 21 条

[1] A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].

BAHL, LR ;

JELINEK, F ;

MERCER, RL .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190

[2] The psychology of reactions to environmental agents [J].

Berglund, B ;

Job, RFS .

ENVIRONMENT INTERNATIONAL, 1996, 22 (01) :1-1

[3]

Collins M, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P1

[4] GENERALIZED ITERATIVE SCALING FOR LOG-LINEAR MODELS [J].

DARROCH, JN ;

RATCLIFF, D .

ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (05) :1470-&

[5] Inducing features of random fields [J].

DellaPietra, S ;

DellaPietra, V ;

Lafferty, J .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393

[6] Using semantic analysis to improve speech recognition performance [J].

Erdogan, H ;

Sarikaya, R ;

Chen, SF ;

Gao, YQ ;

Picheny, M .

COMPUTER SPEECH AND LANGUAGE, 2005, 19 (03) :321-343

[7]

ERDOGAN H, 2002, P ICSLP DENV CO SEP, P933

[8]

GAO Y, 2000, P ICSLP BEIJ CHIN OC, V4, P125

[9]

KUO HKJ, 2003, P ASRU ST THOM US VI, P1

[10]

KUO HKJ, 2004, P ICSLP JEJ ISL S KO, P681

← 1 2 3 →