Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech

被引:4
作者
Weng, Chao [1 ]
Juang, Biing-Hwang [1 ]
机构
[1] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
Discriminative training (DT); minimum classification error (MCE); non-uniform criteria; keyword spotting; weighted finite-state transducer (WFST); HIDDEN MARKOV-MODELS; AUTOMATIC RECOGNITION; VERIFICATION; OPTIMIZATION;
D O I
10.1109/TASLP.2014.2381931
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we formulate the problem of keyword spotting as a non-uniform error automatic speech recognition (ASR) problem and propose a model training methodology based on the non-uniform minimum classification error (MCE) approach. The main idea is to adapt the fundamental MCE criteria to reflect the cost-sensitive notion in that errors on keywords are much more significant than errors on non-keywords in an automatic speech recognition task. The notion of cost sensitivity leads to emphasis of keyword models in parameter optimization. Then we present a system which takes advantage of the weighted finite-state transducer (WFST) framework to efficiently implement the non-uniform MCE. To enhance the approach of non-uniform error cost minimization for keyword spotting, we further formulate a technique called "adaptive boosted non-uniform MCE" which incorporates the idea of boosting. We validate the proposed framework on two challenging large-scale spontaneous conversational telephone speech (CTS) datasets in two different languages (English and Mandarin). Experimental results show our framework can achieve consistent and significant spotting performance gains over both the maximum likelihood estimation (MLE) baseline and conventional discriminatively-trained systems with uniform error cost.
引用
收藏
页码:300 / 312
页数:13
相关论文
共 43 条
[1]  
[Anonymous], P 2 EUR C COMP LEARN
[2]  
[Anonymous], P ASRU 11
[3]  
BAHL L, 1986, P INT C AC SPEECH SI, V1, P49, DOI DOI 10.1109/ICASSP.1986.1169179>
[4]  
Dimitrakakis C, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS, P621
[5]  
Fiscus J.G., 1997, P IEEE WORKSH AUT SP, P347
[6]  
FREUND Y, 1996, P 13 INT C ICML 96
[7]   Automatic Speech Recognition Based on Non-Uniform Error Criteria [J].
Fu, Qiang ;
Zhao, Yong ;
Juang, Biing-Hwang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :780-793
[8]   Semi-tied covariance matrices for hidden Markov models [J].
Gales, MJF .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :272-281
[9]  
Gao W, 2005, LECT NOTES COMPUT SC, V3248, P110
[10]   Minimum Bayes-risk automatic speech recognition [J].
Goel, V ;
Byrne, WJ .
COMPUTER SPEECH AND LANGUAGE, 2000, 14 (02) :115-135