VERIFYING DEEP KEYWORD SPOTTING DETECTION WITH ACOUSTIC WORD EMBEDDINGS

被引:0
|
作者
Yuan, Yougen [1 ,2 ]
Lv, Zhiqiang [2 ]
Huang, Shen [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Tencent Res, Beijing, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
基金
中国国家自然科学基金;
关键词
Query-by-example; keyword spotting; acoustic word embeddings; hinge loss; calibration scores;
D O I
10.1109/asru46091.2019.9003781
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, in order to improve keyword spotting (KWS) performance in a live broadcast scenario, we propose to use a template matching method based on acoustic word embeddings (AWE) as the second stage to verify the detection from the Deep KWS system. AWEs are obtained via a deep bidirectional long short-term memory (BLSTM) network trained using limited positive and negative keyword candidates, which aims to encode variable-length keyword candidates into fixed-dimensional vectors with reasonable discriminative ability. Learning AWEs takes a combination of three specifically-designed losses: the triplet and reversed triplet losses try to keep same keyword candidates closer and different keyword candidates farther, while the hinge loss is to set a fixed threshold to distinguish all positive and negative keyword candidates. During keyword verification, calibration scores are used to reduce the bias between different templates for different keyword candidates. Experiments show that adding AWE-based keyword verification to Deep KWS achieves 5.6% relative accuracy improvement; the hinge loss brings additional 5.5% relative gain and the final accuracy climbs to 0.775 by using calibration scores.
引用
收藏
页码:613 / 620
页数:8
相关论文
共 50 条
  • [21] Keyword spotting in unconstrained handwritten Chinese documents using contextual word model
    Huang, Liang
    Yin, Fei
    Chen, Qing-Hu
    Liu, Cheng-Lin
    IMAGE AND VISION COMPUTING, 2013, 31 (12) : 958 - 968
  • [22] Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
    Settle, Shane
    Levin, Keith
    Kamper, Herman
    Livescu, Karen
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2874 - 2878
  • [23] WHOLE-WORD SEGMENTAL SPEECH RECOGNITION WITH ACOUSTIC WORD EMBEDDINGS
    Shi, Bowen
    Settle, Shane
    Livescu, Karen
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 164 - 171
  • [24] A Hybrid Deep Learning Approach to Keyword Spotting in Vietnamese Stele Images
    Scius-Bertrand A.
    Bui M.
    Fischer A.
    Informatica (Slovenia), 2023, 47 (03): : 361 - 372
  • [25] SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
    Chen, Guoguo
    Parada, Carolina
    Heigold, Georg
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [26] Evaluating Robustness to Noise and Compression of Deep Neural Networks for Keyword Spotting
    Pereira, Pedro H.
    Beccaro, Wesley
    Ramirez, Miguel A.
    IEEE ACCESS, 2023, 11 : 53224 - 53236
  • [27] Keyword Spotting for Industrial Control using Deep Learning on Edge Devices
    Hoelzke, Fabian
    Ahmed, Hameem
    Golatowski, Frank
    Timmermann, Dirk
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 167 - 172
  • [28] Deep Template Matching for Small-footprint and Configurable Keyword Spotting
    Zhang, Peng
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 2572 - 2576
  • [29] LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents
    Wei, Hongxi
    Gao, Guanglai
    Su, Xiangdong
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV, 2016, 9950 : 432 - 441
  • [30] LEARNED IN SPEECH RECOGNITION: CONTEXTUAL ACOUSTIC WORD EMBEDDINGS
    Palaskar, Shruti
    Raunak, Vikas
    Metze, Florian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6530 - 6534