State sequence pooling training of acoustic models for keyword spotting

被引：3

作者：

Lopatka, Kuba ^{[1
]}

Bocklet, Tobias ^{[2
,3
]}

机构：

[1] Intel Corp, Santa Clara, CA 95051 USA

[2] Intel Labs, Santa Clara, CA USA

[3] TH Nurnberg, Nurnberg, Germany

来源：

INTERSPEECH 2020 | 2020年

关键词：

keyword spotting; machine learning; speech recognition;

D O I：

10.21437/Interspeech.2020-2722

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.

引用

页码：4338 / 4342

页数：5

共 50 条

[31] Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets [J].

Zeng, Lu ;

Parthasarathi, Sree Hari Krishnan ;

Liu, Yuzong ;

Escott, Alex ;

Cheekatmalla, Santosh ;

Strom, Nikko ;

Vitaladevuni, Shiv .

TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 :364-376

[32] Discriminative Confidence Measure using Linear Combination of Duration-based Features and Acoustic-based Scores in Keyword Spotting [J].

Goodarzi, Mohammad Mohsen ;

Shekofteh, Yasser ;

Rezaei, Iman Sarraf ;

Kabudian, Jahanshah .

2014 7TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2014, :316-319

[33] Keyword Spotting Based on Hypothesis Boundary Realignment and State-Level Confidence Weighting [J].

Liu, Hong ;

Chen, Yuezhao ;

Ding, Runwei ;

Pang, Cheng .

PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,

[34] Production federated keyword spotting via distillation, filtering, and joint federated-centralized training [J].

Hard, Andrew ;

Partridge, Kurt ;

Chen, Neng ;

Augenstein, Sean ;

Shah, Aishanee ;

Park, Hyun Jin ;

Park, Alex ;

Ng, Sara ;

Nguyen, Jessica ;

Moreno, Ignacio Lopez ;

Mathews, Rajiv ;

Beaufays, Francoise .

INTERSPEECH 2022, 2022, :76-80

[35] Few-Shot Open-Set Keyword Spotting with Multi-Stage Training [J].

Li, Lo-Ya ;

Lo, Tien-Hong ;

Hung, Jeih-Weih ;

Huang, Shih-Chieh ;

Chen, Berlin .

2024 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2024,

[36] AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS [J].

Prabhavalkar, Rohit ;

Alvarez, Raziel ;

Parada, Carolina ;

Nakkiran, Preetum ;

Sainath, Tara N. .

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4704-4708

[37] TOWARDS ON-DEVICE KEYWORD SPOTTING USING LOW-FOOTPRINT QUATERNION NEURAL MODELS [J].

Chaudhary, Aryan ;

Abrol, Vinayak .

2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,

[38] IPHONMATCHNET: ZERO-SHOT USER-DEFINED KEYWORD SPOTTING USING IMPLICIT ACOUSTIC ECHO CANCELLATION [J].

Lee, Yong-Hyeok ;

Cho, Namhyun .

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :12642-12646

[39] Small-Footprint Keyword Spotting for Controlling Smart Home Appliances Using TCN and CRNN Models [J].

Alapati, Hemalatha ;

Paolini, Christopher ;

Chinara, Suchismita ;

Sarkar, Mahasweta .

INTERNATIONAL JOURNAL OF INTERDISCIPLINARY TELECOMMUNICATIONS AND NETWORKING, 2022, 14 (01)

[40] Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data [J].

Xu, Menglong ;

Li, Shengqiang ;

Liang, Chengdong ;

Zhang, Xiao-Lei .

INTERSPEECH 2022, 2022, :3278-3282

← 1 2 3 4 5 →