State sequence pooling training of acoustic models for keyword spotting

被引：3

作者：

Lopatka, Kuba ^{[1
]}

Bocklet, Tobias ^{[2
,3
]}

机构：

[1] Intel Corp, Santa Clara, CA 95051 USA

[2] Intel Labs, Santa Clara, CA USA

[3] TH Nurnberg, Nurnberg, Germany

来源：

INTERSPEECH 2020 | 2020年

关键词：

keyword spotting; machine learning; speech recognition;

D O I：

10.21437/Interspeech.2020-2722

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.

引用

页码：4338 / 4342

页数：5

共 50 条

[21] EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting [J].

Wei, Yungen ;

Gong, Zheng ;

Yang, Shunzhi ;

Ye, Kai ;

Wen, Yamin .

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 13 (03) :1525-1535

[22] Artificial basilar membrane/hair cell integrated acoustic system for keyword spotting in noisy environments inspired by human cochlea [J].

Lee, Useung ;

Jeon, Eun-Seok ;

Hur, Shin ;

Han, Chang-Soo .

MEASUREMENT, 2025, 241

[23] Keyword spotting for self-training of BLSTM NN based handwriting recognition systems [J].

Frinken, Volkmar ;

Fischer, Andreas ;

Baumgartner, Markus ;

Bunke, Horst .

PATTERN RECOGNITION, 2014, 47 (03) :1073-1082

[24] Comparison of Two Microcontroller Boards for On-Device Model Training in a Keyword Spotting Task [J].

Llisterri Gimenez, Nil ;

Freitag, Felix ;

Lee, JunKyu ;

Vandierendonck, Hans .

2022 11TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2022, :159-162

[25] Robust Small-Footprint Keyword Spotting Using Sequence-To-Sequence Model With Connectionist Temporal Classifier [J].

Xuan, Xiaoguang ;

Wang, Mingjiang ;

Zhang, Xin ;

Sun, Fengjiao .

2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, :400-404

[26] Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech [J].

Weng, Chao ;

Juang, Biing-Hwang ;

Povey, Daniel .

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :558-561

[27] Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting [J].

Meng, Zhong ;

Juang, Biing-Hwang .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :770-774

[28] Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech [J].

Weng, Chao ;

Juang, Biing-Hwang .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) :300-312

[29] Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions [J].

Gorin, Arseniy ;

Lileikyte, Rasa ;

Huang, Guangpu ;

Lamel, Lori ;

Gauvain, Jean-Luc ;

Laurent, Antoine .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :775-779

[30] Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets [J].

Zeng, Lu ;

Parthasarathi, Sree Hari Krishnan ;

Liu, Yuzong ;

Escott, Alex ;

Cheekatmalla, Santosh ;

Strom, Nikko ;

Vitaladevuni, Shiv .

TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 :364-376

← 1 2 3 4 5 →