LARGE MARGIN TRAINING OF SEMI-MARKOV MODEL FOR PHONETIC RECOGNITION

被引:1
作者
Kim, Sungwoong [1 ]
Yun, Sungrack [1 ]
Yoo, Chang D. [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Taejon, South Korea
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Hidden Markov model; semi-Markov model; structured support vector machine; phonetic recognition;
D O I
10.1109/ICASSP.2010.5495329
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper considers a large margin training of semi-Markov model (SMM) for phonetic recognition. The SMM framework is better suited for phonetic recognition than the hidden Markov model (HMM) framework in that the SMM framework is capable of simultaneously segmenting the uttered speech into phones and labeling the segment-based features. In this paper, the SMM framework is used to define a discriminant function that is linear in the joint feature map which attempts to capture the long-range statistical dependencies within a segment and between adjacent segments of variable length. The parameters of the discriminant function are estimated by a large margin learning criterion for structured prediction. The parameter estimation problem, which is an optimization problem with many margin constraints, is solved by using a stochastic subgradient descent algorithm. The proposed large margin SMM outperforms the large margin HMM on the TIMIT corpus.
引用
收藏
页码:1910 / 1913
页数:4
相关论文
共 14 条
[1]  
[Anonymous], 2005, J MACHINE LEARNING R
[2]  
[Anonymous], 2005, NIPS
[3]  
Bagnell J., 2007, AISTATS
[4]  
Gunawardana A., 2005, INTERSPEECH
[5]   Large margin hidden Markov models for speech recognition [J].
Jiang, Hui ;
Li, Xinwei ;
Liu, Chaojun .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05) :1584-1595
[6]   Capacity and complexity of HMM duration modeling techniques [J].
Johnson, MT .
IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (05) :407-410
[7]   Approximate test risk bound minimization through soft margin estimation [J].
Li, Jinyu ;
Yuan, Ming ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08) :2393-2404
[8]   Conditional random fields for integrating local discriminative classifiers [J].
Morris, Jeremy ;
Fosler-Lussier, Eric .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03) :617-628
[9]   A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System [J].
Oura, Keiichiro ;
Zen, Heiga ;
Nankaku, Yoshihiko ;
Lee, Akinobu ;
Tokuda, Keiichi .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (11) :2693-2700
[10]  
Povey D, 2002, INT CONF ACOUST SPEE, P105