Deep Spoken Keyword Spotting: An Overview

被引:47
|
作者
Lopez-Espejo, Ivan [1 ]
Tan, Zheng-Hua [1 ]
Hansen, John H. L. [2 ]
Jensen, Jesper [1 ,3 ]
机构
[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
[3] Oticon AS, DK-2765 Smorum, Denmark
关键词
Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;
D O I
10.1109/ACCESS.2021.3139508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.
引用
收藏
页码:4169 / 4199
页数:31
相关论文
共 50 条
  • [21] Acoustic Similarity Scores for Keyword Spotting
    Veiga, Arlindo
    Lopes, Carla
    Sa, Luis
    Perdigao, Fernando
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 48 - 58
  • [22] An approach of keyword spotting based on HMM
    Yan, BF
    Guo, R
    Zhu, XY
    Zhang, B
    PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 2757 - 2759
  • [23] DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Tang, Raphael
    Lin, Jimmy
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5484 - 5488
  • [24] Keyword Transformer: A Self-Attention Model for Keyword Spotting
    Berg, Axel
    O'Connor, Mark
    Cruz, Miguel Tairum
    INTERSPEECH 2021, 2021, : 4249 - 4253
  • [25] On-the-Fly Deformations for Keyword Spotting
    Retsinas, George
    Sfikas, Giorgos
    Gatos, Basilis
    Nikou, Christophoros
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 338 - 351
  • [26] Streaming keyword spotting on mobile devices
    Rybakov, Oleg
    Kononenko, Natasha
    Subrahmanya, Niranjan
    Visontai, Mirko
    Laurenzo, Stella
    INTERSPEECH 2020, 2020, : 2277 - 2281
  • [27] BONSAPPS KEYWORD SPOTTING CHALLENGE : ENVIRONMENT AWARE UNIVERSAL KEYWORD ENCODER FOR LOW FOOTPRINT DEVICES
    Hafsati, Mohammed
    Bentounes, Kamil
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [28] Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN
    Zheng, Peijia
    Cai, Zhiwei
    Zeng, Huicong
    Huang, Jiwu
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1474 - 1483
  • [29] Metadata-Aware End-to-End Keyword Spotting
    Liu, Hongyi
    Abhyankar, Apurva
    Mishchenko, Yuriy
    Senechal, Thibaud
    Fu, Gengshen
    Kulis, Brian
    Stein, Noah
    Shah, Anish
    Vitaladevuni, Shiv Naga Prasad
    INTERSPEECH 2020, 2020, : 2282 - 2286
  • [30] SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK
    Chen, Xi
    Yin, Shouyi
    Song, Dandan
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 539 - 546