VERIFYING DEEP KEYWORD SPOTTING DETECTION WITH ACOUSTIC WORD EMBEDDINGS

被引:0
|
作者
Yuan, Yougen [1 ,2 ]
Lv, Zhiqiang [2 ]
Huang, Shen [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Tencent Res, Beijing, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
基金
中国国家自然科学基金;
关键词
Query-by-example; keyword spotting; acoustic word embeddings; hinge loss; calibration scores;
D O I
10.1109/asru46091.2019.9003781
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, in order to improve keyword spotting (KWS) performance in a live broadcast scenario, we propose to use a template matching method based on acoustic word embeddings (AWE) as the second stage to verify the detection from the Deep KWS system. AWEs are obtained via a deep bidirectional long short-term memory (BLSTM) network trained using limited positive and negative keyword candidates, which aims to encode variable-length keyword candidates into fixed-dimensional vectors with reasonable discriminative ability. Learning AWEs takes a combination of three specifically-designed losses: the triplet and reversed triplet losses try to keep same keyword candidates closer and different keyword candidates farther, while the hinge loss is to set a fixed threshold to distinguish all positive and negative keyword candidates. During keyword verification, calibration scores are used to reduce the bias between different templates for different keyword candidates. Experiments show that adding AWE-based keyword verification to Deep KWS achieves 5.6% relative accuracy improvement; the hinge loss brings additional 5.5% relative gain and the final accuracy climbs to 0.775 by using calibration scores.
引用
收藏
页码:613 / 620
页数:8
相关论文
共 50 条
  • [1] REPRESENTING WORD IMAGE USING VISUAL WORD EMBEDDINGS AND RNN FOR KEYWORD SPOTTING ON HISTORICAL DOCUMENT IMAGES
    Wei, Hongxi
    Zhang, Hui
    Gao, Guanglai
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1368 - 1373
  • [2] Generalized Keyword Spotting using ASR embeddings
    Kirandevraj, R.
    Kurmi, Vinod K.
    Namboodiri, Vinay P.
    Jawahar, C. V.
    INTERSPEECH 2022, 2022, : 126 - 130
  • [3] Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents
    Wei, Hongxi
    Zhang, Hui
    Gao, Guanglai
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3616 - 3621
  • [4] Acoustic Similarity Scores for Keyword Spotting
    Veiga, Arlindo
    Lopes, Carla
    Sa, Luis
    Perdigao, Fernando
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 48 - 58
  • [5] Acoustic word embeddings for ASR error detection
    Ghannay, Sahar
    Esteve, Yannick
    Camelin, Nathalie
    deleglise, Paul
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1330 - 1334
  • [6] Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images
    Wei, Hongxi
    Zhang, Hui
    Gao, Guanglai
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 616 - 625
  • [7] Deep Spoken Keyword Spotting: An Overview
    Lopez-Espejo, Ivan
    Tan, Zheng-Hua
    Hansen, John H. L.
    Jensen, Jesper
    IEEE ACCESS, 2022, 10 : 4169 - 4199
  • [8] Deep Features Representation of Word Image for Keyword Spotting in Historical Mongolian Document Images
    Wei, Hongxi
    Zhang, Jing
    Zhang, Hui
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 413 - 417
  • [9] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
    Sacchi, Niccolo
    Nanchen, Alexandre
    Jaggi, Martin
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 3362 - 3366
  • [10] Keyword Spotting in Historical Devanagari Manuscripts by Word Matching
    Sharada, B.
    Sushma, S. N.
    Bharathlal
    DATA ANALYTICS AND LEARNING, 2019, 43 : 65 - 76