State sequence pooling training of acoustic models for keyword spotting

被引:3
|
作者
Lopatka, Kuba [1 ]
Bocklet, Tobias [2 ,3 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Intel Labs, Santa Clara, CA USA
[3] TH Nurnberg, Nurnberg, Germany
来源
INTERSPEECH 2020 | 2020年
关键词
keyword spotting; machine learning; speech recognition;
D O I
10.21437/Interspeech.2020-2722
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a new training method to improve HMM-based keyword spotting. The loss function is based on a score computed with the keyword/filler model from the entire input sequence. It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors. We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a low-footprint TDNN for acoustic modeling. The proposed training yields significant and consistent improvement over the baseline in adverse noise conditions. The FRR on cafeteria noise is reduced from 13.07% to 5.28% at 9 dB SNR and from 37.44% to 6.78% at 5 dB SNR. We obtain these results with only 600 unique training keyword samples. The training method is independent of the frontend and acoustic model topology.
引用
收藏
页码:4338 / 4342
页数:5
相关论文
共 50 条
  • [1] Acoustic Similarity Scores for Keyword Spotting
    Veiga, Arlindo
    Lopes, Carla
    Sa, Luis
    Perdigao, Fernando
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 48 - 58
  • [2] Language Independent and Unsupervised Acoustic Models for Speech Recognition and Keyword Spotting
    Knill, Kate M.
    Gales, Mark J. F.
    Ragni, Anton
    Rath, Shakti P.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 16 - 20
  • [3] STREAMING SMALL-FOOTPRINT KEYWORD SPOTTING USING SEQUENCE-TO-SEQUENCE MODELS
    He, Yanzhang
    Prabhavalkar, Rohit
    Rao, Kanishka
    Li, Wei
    Bakhtin, Anton
    McGraw, Ian
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 474 - 481
  • [4] Improved Keyword Spotting based on Keyword/Garbage Models
    Chen, Qiyu
    Zhang, Weibin
    Xu, Xiangmin
    Xing, Xiaofen
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] Training Keyword Spotting Models on Non-IID Data with Federated Learning
    Hard, Andrew
    Partridge, Kurt
    Nguyen, Cameron
    Subrahmanya, Niranjan
    Shah, Aishanee
    Zhu, Pai
    Moreno, Ignacio Lopez
    Mathews, Rajiv
    INTERSPEECH 2020, 2020, : 4343 - 4347
  • [6] Robust Keyword Spotting via Recycle-Pooling for Mobile Game
    An, Shounan
    Kim, Youngsoo
    Xu, Hu
    Lee, Jinwoo
    Lee, Myungwoo
    Oh, Insoo
    INTERSPEECH 2019, 2019, : 3661 - 3662
  • [7] MINING EFFECTIVE NEGATIVE TRAINING SAMPLES FOR KEYWORD SPOTTING
    Hou, Jingyong
    Shi, Yangyang
    Ostendorf, Mani
    Hwang, Mei-Yuh
    Xie, Lei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7444 - 7448
  • [8] VERIFYING DEEP KEYWORD SPOTTING DETECTION WITH ACOUSTIC WORD EMBEDDINGS
    Yuan, Yougen
    Lv, Zhiqiang
    Huang, Shen
    Xie, Lei
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 613 - 620
  • [9] MAX-POOLING LOSS TRAINING OF LONG SHORT-TERM MEMORY NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Sun, Ming
    Raju, Anirudh
    Tucker, George
    Panchapagesan, Sankaran
    Fu, Gengshen
    Mandal, Arindam
    Matsoukas, Spyros
    Strom, Nikko
    Vitaladevuni, Shiv
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 474 - 480
  • [10] Bayesian background models for keyword spotting in handwritten documents
    Kumar, Gaurav
    Govindaraju, Venu
    PATTERN RECOGNITION, 2017, 64 : 84 - 91