Deep Spoken Keyword Spotting: An Overview

被引:47
|
作者
Lopez-Espejo, Ivan [1 ]
Tan, Zheng-Hua [1 ]
Hansen, John H. L. [2 ]
Jensen, Jesper [1 ,3 ]
机构
[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
[3] Oticon AS, DK-2765 Smorum, Denmark
关键词
Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;
D O I
10.1109/ACCESS.2021.3139508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.
引用
收藏
页码:4169 / 4199
页数:31
相关论文
共 50 条
  • [31] Exploring TinyML Frameworks for Small-Footprint Keyword Spotting: A Concise Overview
    Garai, Soumen
    Samui, Suman
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
  • [32] A Hybrid Deep Learning Approach to Keyword Spotting in Vietnamese Stele Images
    Scius-Bertrand A.
    Bui M.
    Fischer A.
    Informatica (Slovenia), 2023, 47 (03): : 361 - 372
  • [33] SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
    Chen, Guoguo
    Parada, Carolina
    Heigold, Georg
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [34] A Lightweight Architecture for Query-by-Example Keyword Spotting on Low-Power IoT Devices
    Li, Meirong
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2023, 69 (01) : 65 - 75
  • [35] A Deep Learning-Based Noise-Resilient Keyword Spotting Engine for Embedded Platforms
    Abdelmoula, Ramzi
    Khamis, Alaa
    Karray, Fakhri
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2019), PT II, 2019, 11663 : 134 - 146
  • [36] Keyword Spotting for Industrial Control using Deep Learning on Edge Devices
    Hoelzke, Fabian
    Ahmed, Hameem
    Golatowski, Frank
    Timmermann, Dirk
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 167 - 172
  • [37] Deep Template Matching for Small-footprint and Configurable Keyword Spotting
    Zhang, Peng
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 2572 - 2576
  • [38] Beam Pruning Based on Quantile for Keyword Spotting
    Li, Haiyang
    Yuan, Hao
    Han, Jiqing
    Zheng, Tieran
    2012 INTERNATIONAL WORKSHOP ON INFORMATION AND ELECTRONICS ENGINEERING, 2012, 29 : 2985 - 2989
  • [39] Comparison of Keyword Spotting Methods for Searching in Speech
    Smidl, Lubos
    Psutka, Josef V.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1894 - 1897
  • [40] SUPERVISED NOISE REDUCTION FOR MULTICHANNEL KEYWORD SPOTTING
    Huang, Yiteng
    Hughes, Thad
    Shabestary, Turaj Z.
    Applebaum, Taylor
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5474 - 5478