State sequence pooling training of acoustic models for keyword spotting

被引:3
作者
Lopatka, Kuba [1 ]
Bocklet, Tobias [2 ,3 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Intel Labs, Santa Clara, CA USA
[3] TH Nurnberg, Nurnberg, Germany
来源
INTERSPEECH 2020 | 2020年
关键词
keyword spotting; machine learning; speech recognition;
D O I
10.21437/Interspeech.2020-2722
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.
引用
收藏
页码:4338 / 4342
页数:5
相关论文
共 50 条
  • [21] Keyword spotting for self-training of BLSTM NN based handwriting recognition systems
    Frinken, Volkmar
    Fischer, Andreas
    Baumgartner, Markus
    Bunke, Horst
    PATTERN RECOGNITION, 2014, 47 (03) : 1073 - 1082
  • [22] Artificial basilar membrane/hair cell integrated acoustic system for keyword spotting in noisy environments inspired by human cochlea
    Lee, Useung
    Jeon, Eun-Seok
    Hur, Shin
    Han, Chang-Soo
    MEASUREMENT, 2025, 241
  • [23] Robust Small-Footprint Keyword Spotting Using Sequence-To-Sequence Model With Connectionist Temporal Classifier
    Xuan, Xiaoguang
    Wang, Mingjiang
    Zhang, Xin
    Sun, Fengjiao
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 400 - 404
  • [24] Comparison of Two Microcontroller Boards for On-Device Model Training in a Keyword Spotting Task
    Llisterri Gimenez, Nil
    Freitag, Felix
    Lee, JunKyu
    Vandierendonck, Hans
    2022 11TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2022, : 159 - 162
  • [25] Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech
    Weng, Chao
    Juang, Biing-Hwang
    Povey, Daniel
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 558 - 561
  • [26] Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions
    Gorin, Arseniy
    Lileikyte, Rasa
    Huang, Guangpu
    Lamel, Lori
    Gauvain, Jean-Luc
    Laurent, Antoine
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 775 - 779
  • [27] Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting
    Meng, Zhong
    Juang, Biing-Hwang
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 770 - 774
  • [28] Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech
    Weng, Chao
    Juang, Biing-Hwang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) : 300 - 312
  • [29] Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets
    Zeng, Lu
    Parthasarathi, Sree Hari Krishnan
    Liu, Yuzong
    Escott, Alex
    Cheekatmalla, Santosh
    Strom, Nikko
    Vitaladevuni, Shiv
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 364 - 376
  • [30] A novel phone-state matrix based vocabulary-indenendent keyword spotting method for spontaneous speech
    Gao, Peng
    Liang, JiaEn
    Ding, Peng
    Xu, Bo
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 425 - +