State sequence pooling training of acoustic models for keyword spotting

被引:3
作者
Lopatka, Kuba [1 ]
Bocklet, Tobias [2 ,3 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Intel Labs, Santa Clara, CA USA
[3] TH Nurnberg, Nurnberg, Germany
来源
INTERSPEECH 2020 | 2020年
关键词
keyword spotting; machine learning; speech recognition;
D O I
10.21437/Interspeech.2020-2722
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.
引用
收藏
页码:4338 / 4342
页数:5
相关论文
共 50 条
[31]   Discriminative Confidence Measure using Linear Combination of Duration-based Features and Acoustic-based Scores in Keyword Spotting [J].
Goodarzi, Mohammad Mohsen ;
Shekofteh, Yasser ;
Rezaei, Iman Sarraf ;
Kabudian, Jahanshah .
2014 7TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2014, :316-319
[32]   Keyword Spotting Based on Hypothesis Boundary Realignment and State-Level Confidence Weighting [J].
Liu, Hong ;
Chen, Yuezhao ;
Ding, Runwei ;
Pang, Cheng .
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
[33]   Production federated keyword spotting via distillation, filtering, and joint federated-centralized training [J].
Hard, Andrew ;
Partridge, Kurt ;
Chen, Neng ;
Augenstein, Sean ;
Shah, Aishanee ;
Park, Hyun Jin ;
Park, Alex ;
Ng, Sara ;
Nguyen, Jessica ;
Moreno, Ignacio Lopez ;
Mathews, Rajiv ;
Beaufays, Francoise .
INTERSPEECH 2022, 2022, :76-80
[34]   AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS [J].
Prabhavalkar, Rohit ;
Alvarez, Raziel ;
Parada, Carolina ;
Nakkiran, Preetum ;
Sainath, Tara N. .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4704-4708
[35]   TOWARDS ON-DEVICE KEYWORD SPOTTING USING LOW-FOOTPRINT QUATERNION NEURAL MODELS [J].
Chaudhary, Aryan ;
Abrol, Vinayak .
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[36]   Small-Footprint Keyword Spotting for Controlling Smart Home Appliances Using TCN and CRNN Models [J].
Alapati, Hemalatha ;
Paolini, Christopher ;
Chinara, Suchismita ;
Sarkar, Mahasweta .
INTERNATIONAL JOURNAL OF INTERDISCIPLINARY TELECOMMUNICATIONS AND NETWORKING, 2022, 14 (01)
[37]   Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data [J].
Xu, Menglong ;
Li, Shengqiang ;
Liang, Chengdong ;
Zhang, Xiao-Lei .
INTERSPEECH 2022, 2022, :3278-3282
[38]   Leveraging Contrastive Language-Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting [J].
Liu, Dong ;
Mao, Qirong ;
Gao, Lijian ;
Wang, Gang .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
[39]   Training data selection for improving discriminative training of acoustic models [J].
Liu, Shih-Hung ;
Chu, Fang-Hui ;
Lin, Shih-Hsiang ;
Lee, Hung-Shin ;
Chen, Berlin .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :284-289
[40]   Training data selection for improving discriminative training of acoustic models [J].
Chen, Berlin ;
Liu, Shih-Hung ;
Chu, Fang-Hui .
PATTERN RECOGNITION LETTERS, 2009, 30 (13) :1228-1235