HYBRID DNN-LATENT STRUCTURED SVM ACOUSTIC MODELS FOR CONTINUOUS SPEECH RECOGNITION

被引：0

作者：

Ravuri, Suman ^{[1
,2
]}

机构：

[1] Int Comp Sci Inst, Berkeley, CA 94704 USA

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2015年

关键词：

Structured SVM; Deep Learning; Sequence-Discriminative Training; Large Margin; Acoustic Modeling;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose Deep Neural Network (DNN)-Latent Structured Support Vector Machine (LSSVM) Acoustic Models as replacement for more standard sequence-discriminative trained DNN-HMM hybrid acoustic models. Compared to existing methods, approaches based on margin maximization, as is considered in this work, enjoy better theoretical justification. In addition to a max-margin based criteria, we also extend the Structured SVM model to include latent variables in the model to account for uncertainty in state alignments. Introducing latent structure allows for better sample complexity, often requiring 3 3 % to 6 6 % fewer utterances to converge compared to alternate criteria. On an 8-hour independent test set of conversational speech, the proposed method decreases word error rate by 9% relative to a cross-entropy trained hybrid system, while the best existing system decreases the word error rate by 6.5% relative.

引用

页码：37 / 44

页数：8

共 30 条

[1] ALTUN Y, 2003, P INT C MACH LEARN
[2] [Anonymous], 2011, WORKSH AUT SPEECH RE
[3] [Anonymous], 2010, THESIS
[4] BAHL L, 1986, P INT C AC SPEECH SI, V1, P49, DOI DOI 10.1109/ICASSP.1986.1169179>
[5] Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus
Carletta, Jean
[J]. LANGUAGE RESOURCES AND EVALUATION, 2007, 41 (02) : 181 - 190
[6] Cetin Oliver, 2005, TECH REP
[7] Chang S.-Y., IMPORTANCE MODELING
[8] Gibson Matthew., 2006, In Proc. Interspeech, P2
[9] Gillick D., 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), P71, DOI 10.1109/ASRU.2011.6163908
[10] Minimum Bayes-risk automatic speech recognition
Goel, V
Byrne, WJ
[J]. COMPUTER SPEECH AND LANGUAGE, 2000, 14 (02) : 115 - 135

← 1 2 3 →