State sequence pooling training of acoustic models for keyword spotting

被引:3
作者
Lopatka, Kuba [1 ]
Bocklet, Tobias [2 ,3 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Intel Labs, Santa Clara, CA USA
[3] TH Nurnberg, Nurnberg, Germany
来源
INTERSPEECH 2020 | 2020年
关键词
keyword spotting; machine learning; speech recognition;
D O I
10.21437/Interspeech.2020-2722
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.
引用
收藏
页码:4338 / 4342
页数:5
相关论文
共 50 条
[21]   EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting [J].
Wei, Yungen ;
Gong, Zheng ;
Yang, Shunzhi ;
Ye, Kai ;
Wen, Yamin .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 13 (03) :1525-1535
[22]   Artificial basilar membrane/hair cell integrated acoustic system for keyword spotting in noisy environments inspired by human cochlea [J].
Lee, Useung ;
Jeon, Eun-Seok ;
Hur, Shin ;
Han, Chang-Soo .
MEASUREMENT, 2025, 241
[23]   Keyword spotting for self-training of BLSTM NN based handwriting recognition systems [J].
Frinken, Volkmar ;
Fischer, Andreas ;
Baumgartner, Markus ;
Bunke, Horst .
PATTERN RECOGNITION, 2014, 47 (03) :1073-1082
[24]   Comparison of Two Microcontroller Boards for On-Device Model Training in a Keyword Spotting Task [J].
Llisterri Gimenez, Nil ;
Freitag, Felix ;
Lee, JunKyu ;
Vandierendonck, Hans .
2022 11TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2022, :159-162
[25]   Robust Small-Footprint Keyword Spotting Using Sequence-To-Sequence Model With Connectionist Temporal Classifier [J].
Xuan, Xiaoguang ;
Wang, Mingjiang ;
Zhang, Xin ;
Sun, Fengjiao .
2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, :400-404
[26]   Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech [J].
Weng, Chao ;
Juang, Biing-Hwang ;
Povey, Daniel .
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :558-561
[27]   Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting [J].
Meng, Zhong ;
Juang, Biing-Hwang .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :770-774
[28]   Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech [J].
Weng, Chao ;
Juang, Biing-Hwang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) :300-312
[29]   Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions [J].
Gorin, Arseniy ;
Lileikyte, Rasa ;
Huang, Guangpu ;
Lamel, Lori ;
Gauvain, Jean-Luc ;
Laurent, Antoine .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :775-779
[30]   Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets [J].
Zeng, Lu ;
Parthasarathi, Sree Hari Krishnan ;
Liu, Yuzong ;
Escott, Alex ;
Cheekatmalla, Santosh ;
Strom, Nikko ;
Vitaladevuni, Shiv .
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 :364-376