State sequence pooling training of acoustic models for keyword spotting

被引:3
作者
Lopatka, Kuba [1 ]
Bocklet, Tobias [2 ,3 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Intel Labs, Santa Clara, CA USA
[3] TH Nurnberg, Nurnberg, Germany
来源
INTERSPEECH 2020 | 2020年
关键词
keyword spotting; machine learning; speech recognition;
D O I
10.21437/Interspeech.2020-2722
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.
引用
收藏
页码:4338 / 4342
页数:5
相关论文
共 50 条
[41]   Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models [J].
Kanda, Naoyuki ;
Fujita, Yusuke ;
Nagamatsu, Kenji .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :2923-2927
[42]   Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting [J].
Meng, Zhong ;
Juang, Biing-Hwang .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :3547-3551
[43]   Discriminative training of acoustic models for system combination [J].
Tachioka, Yuuki ;
Watanabe, Shinji .
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :2354-2358
[44]   SEQUENCE-TO-SEQUENCE MODELS FOR PUNCTUATED TRANSCRIPTION COMBINING LEXICAL AND ACOUSTIC FEATURES [J].
Klejch, Ondrej ;
Bell, Peter ;
Renals, Steve .
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :5700-5704
[45]   Data Augmentation for Training of Noise Robust Acoustic Models [J].
Prisyach, Tatiana ;
Mendelev, Valentin ;
Ubskiy, Dmitry .
ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2016, 2017, 661 :17-25
[46]   An investigation of mixup training strategies for acoustic models in ASR [J].
Medennikov, Ivan ;
Khokhlov, Yuri ;
Romanenko, Aleksei ;
Popov, Dmitry ;
Tomashenko, Natalia ;
Sorokin, Ivan ;
Zatvornitskiy, Alexander .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :2903-2907
[47]   Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages [J].
Alumae, Tanel ;
Tsakalidis, Stavros ;
Schwartz, Richard .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3883-3887
[48]   The Use of Sense in Unsupervised Training of Acoustic Models for ASR Systems [J].
Singh, Rita ;
Lambert, Benjamin ;
Raj, Bhiksha .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, :2938-2941
[49]   Rapid Training of Acoustic Models using Graphics Processing Units [J].
Buthpitiya, Senaka ;
Lane, Ian ;
Chong, Jike .
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :800-803
[50]   One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document [J].
Parziale, Antonio ;
Capriolo, Giuliana ;
Marcelli, Angelo .
JOURNAL OF IMAGING, 2020, 6 (10)