State sequence pooling training of acoustic models for keyword spotting

被引：3

作者：

Lopatka, Kuba ^{[1
]}

Bocklet, Tobias ^{[2
,3
]}

机构：

[1] Intel Corp, Santa Clara, CA 95051 USA

[2] Intel Labs, Santa Clara, CA USA

[3] TH Nurnberg, Nurnberg, Germany

来源：

INTERSPEECH 2020 | 2020年

关键词：

keyword spotting; machine learning; speech recognition;

D O I：

10.21437/Interspeech.2020-2722

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.

引用

页码：4338 / 4342

页数：5

共 50 条

[1] Acoustic Similarity Scores for Keyword Spotting
Veiga, Arlindo
Lopes, Carla
Sa, Luis
Perdigao, Fernando
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 48 - 58
[2] Language Independent and Unsupervised Acoustic Models for Speech Recognition and Keyword Spotting
Knill, Kate M.
Gales, Mark J. F.
Ragni, Anton
Rath, Shakti P.
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 16 - 20
[3] STREAMING SMALL-FOOTPRINT KEYWORD SPOTTING USING SEQUENCE-TO-SEQUENCE MODELS
He, Yanzhang
Prabhavalkar, Rohit
Rao, Kanishka
Li, Wei
Bakhtin, Anton
McGraw, Ian
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 474 - 481
[4] Improved Keyword Spotting based on Keyword/Garbage Models
Chen, Qiyu
Zhang, Weibin
Xu, Xiangmin
Xing, Xiaofen
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[5] Training Keyword Spotting Models on Non-IID Data with Federated Learning
Hard, Andrew
Partridge, Kurt
Nguyen, Cameron
Subrahmanya, Niranjan
Shah, Aishanee
Zhu, Pai
Moreno, Ignacio Lopez
Mathews, Rajiv
INTERSPEECH 2020, 2020, : 4343 - 4347
[6] Robust Keyword Spotting via Recycle-Pooling for Mobile Game
An, Shounan
Kim, Youngsoo
Xu, Hu
Lee, Jinwoo
Lee, Myungwoo
Oh, Insoo
INTERSPEECH 2019, 2019, : 3661 - 3662
[7] MINING EFFECTIVE NEGATIVE TRAINING SAMPLES FOR KEYWORD SPOTTING
Hou, Jingyong
Shi, Yangyang
Ostendorf, Mani
Hwang, Mei-Yuh
Xie, Lei
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7444 - 7448
[8] VERIFYING DEEP KEYWORD SPOTTING DETECTION WITH ACOUSTIC WORD EMBEDDINGS
Yuan, Yougen
Lv, Zhiqiang
Huang, Shen
Xie, Lei
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 613 - 620
[9] MAX-POOLING LOSS TRAINING OF LONG SHORT-TERM MEMORY NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING
Sun, Ming
Raju, Anirudh
Tucker, George
Panchapagesan, Sankaran
Fu, Gengshen
Mandal, Arindam
Matsoukas, Spyros
Strom, Nikko
Vitaladevuni, Shiv
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 474 - 480
[10] Bayesian background models for keyword spotting in handwritten documents
Kumar, Gaurav
Govindaraju, Venu
PATTERN RECOGNITION, 2017, 64 : 84 - 91

← 1 2 3 4 5 →