State sequence pooling training of acoustic models for keyword spotting

被引:3
作者
Lopatka, Kuba [1 ]
Bocklet, Tobias [2 ,3 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Intel Labs, Santa Clara, CA USA
[3] TH Nurnberg, Nurnberg, Germany
来源
INTERSPEECH 2020 | 2020年
关键词
keyword spotting; machine learning; speech recognition;
D O I
10.21437/Interspeech.2020-2722
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.
引用
收藏
页码:4338 / 4342
页数:5
相关论文
共 50 条
[31]   Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets [J].
Zeng, Lu ;
Parthasarathi, Sree Hari Krishnan ;
Liu, Yuzong ;
Escott, Alex ;
Cheekatmalla, Santosh ;
Strom, Nikko ;
Vitaladevuni, Shiv .
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 :364-376
[32]   Discriminative Confidence Measure using Linear Combination of Duration-based Features and Acoustic-based Scores in Keyword Spotting [J].
Goodarzi, Mohammad Mohsen ;
Shekofteh, Yasser ;
Rezaei, Iman Sarraf ;
Kabudian, Jahanshah .
2014 7TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2014, :316-319
[33]   Keyword Spotting Based on Hypothesis Boundary Realignment and State-Level Confidence Weighting [J].
Liu, Hong ;
Chen, Yuezhao ;
Ding, Runwei ;
Pang, Cheng .
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
[34]   Production federated keyword spotting via distillation, filtering, and joint federated-centralized training [J].
Hard, Andrew ;
Partridge, Kurt ;
Chen, Neng ;
Augenstein, Sean ;
Shah, Aishanee ;
Park, Hyun Jin ;
Park, Alex ;
Ng, Sara ;
Nguyen, Jessica ;
Moreno, Ignacio Lopez ;
Mathews, Rajiv ;
Beaufays, Francoise .
INTERSPEECH 2022, 2022, :76-80
[35]   Few-Shot Open-Set Keyword Spotting with Multi-Stage Training [J].
Li, Lo-Ya ;
Lo, Tien-Hong ;
Hung, Jeih-Weih ;
Huang, Shih-Chieh ;
Chen, Berlin .
2024 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2024,
[36]   AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS [J].
Prabhavalkar, Rohit ;
Alvarez, Raziel ;
Parada, Carolina ;
Nakkiran, Preetum ;
Sainath, Tara N. .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4704-4708
[37]   TOWARDS ON-DEVICE KEYWORD SPOTTING USING LOW-FOOTPRINT QUATERNION NEURAL MODELS [J].
Chaudhary, Aryan ;
Abrol, Vinayak .
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[38]   IPHONMATCHNET: ZERO-SHOT USER-DEFINED KEYWORD SPOTTING USING IMPLICIT ACOUSTIC ECHO CANCELLATION [J].
Lee, Yong-Hyeok ;
Cho, Namhyun .
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :12642-12646
[39]   Small-Footprint Keyword Spotting for Controlling Smart Home Appliances Using TCN and CRNN Models [J].
Alapati, Hemalatha ;
Paolini, Christopher ;
Chinara, Suchismita ;
Sarkar, Mahasweta .
INTERNATIONAL JOURNAL OF INTERDISCIPLINARY TELECOMMUNICATIONS AND NETWORKING, 2022, 14 (01)
[40]   Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data [J].
Xu, Menglong ;
Li, Shengqiang ;
Liang, Chengdong ;
Zhang, Xiao-Lei .
INTERSPEECH 2022, 2022, :3278-3282