Hidden Conditional Random Fields for Phone Recognition

被引:20
作者
Sung, Yun-Hsuan [1 ]
Jurafsky, Dan [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009) | 2009年
关键词
D O I
10.1109/ASRU.2009.5373329
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We apply Hidden Conditional Random Fields (HCRFs) to the task of TIMIT phone recognition. HCRFs are discriminatively trained sequence models that augment conditional random fields with hidden states that are capable of representing subphones and mixture components. We extend HCRFs, which had previously only been applied to phone classification with known boundaries, to recognize continuous phone sequences. We use an N-best inference algorithm in both learning (to approximate all competitor phone sequences) and decoding (to marginalize over hidden states). Our monophone HCRFs achieve 28.3% phone error rate, outperforming maximum likelihood trained HMMs by 3.6%, maximum mutual information trained HMMs by 2.5%, and minimum phone error trained HMMs by 2.2%. We show that this win is partially due to HCRFs' ability to simultaneously optimize discriminative language models and acoustic models, a powerful property that has important implications for speech recognition.
引用
收藏
页码:107 / 112
页数:6
相关论文
共 17 条
[1]  
Chow Y.L., 1990, the DARPA Speech and Natural Language Workshop, P81
[2]  
Gunawardana Asela., 2005, Proceedings of Nineth European Conference on Speech Communication and Technology (EuroSpeech 2005), P1117
[3]  
Heigold G, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P273
[4]  
Kapadia S., 1993, ICASSP-93. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.92CH3252-4), P491, DOI 10.1109/ICASSP.1993.319349
[5]  
Lafferty J.D., 2001, P 18 INT C MACHINE L, P282, DOI DOI 10.5555/645530.655813
[6]  
LAMEL L, 1986, DARPA SPEECH REC WOR
[7]   SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS [J].
LEE, KF ;
HON, HW .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11) :1641-1648
[8]   Conditional random fields for integrating local discriminative classifiers [J].
Morris, Jeremy ;
Fosler-Lussier, Eric .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03) :617-628
[9]  
Ng A.Y., 2002, NIPS, V14
[10]   Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-Mixture HMM With HBT Structure for Speech-Driven Lip-Sync [J].
Park, Junho ;
Ko, Hanseok .
IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (07) :1299-1306