Deep Spoken Keyword Spotting: An Overview

被引:47
|
作者
Lopez-Espejo, Ivan [1 ]
Tan, Zheng-Hua [1 ]
Hansen, John H. L. [2 ]
Jensen, Jesper [1 ,3 ]
机构
[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
[3] Oticon AS, DK-2765 Smorum, Denmark
关键词
Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;
D O I
10.1109/ACCESS.2021.3139508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.
引用
收藏
页码:4169 / 4199
页数:31
相关论文
共 50 条
  • [41] Efficient Keyword Spotting through Hardware-Aware Conditional Execution of Deep Neural Networks
    Giraldo, J. S. P.
    O'Connor, Chris
    Verhelst, Marian
    2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
  • [42] Effective Combination of DenseNet and BiLSTM for Keyword Spotting
    Zeng, Mengjun
    Xiao, Nanfeng
    IEEE ACCESS, 2019, 7 : 10767 - 10775
  • [43] Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware
    Blouw, Peter
    Choo, Xuan
    Hunsberger, Eric
    Eliasmith, Chris
    PROCEEDINGS OF THE 2019 7TH ANNUAL NEURO-INSPIRED COMPUTATIONAL ELEMENTS WORKSHOP (NICE 2019), 2020,
  • [44] Generalized Keyword Spotting using ASR embeddings
    Kirandevraj, R.
    Kurmi, Vinod K.
    Namboodiri, Vinay P.
    Jawahar, C. V.
    INTERSPEECH 2022, 2022, : 126 - 130
  • [45] Reduced Model Size Deep Convolutional Neural Networks for Small-Footprint Keyword Spotting
    Tsai, Tsung Han
    Lin, Xin Hui
    2021 28TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (IEEE ICECS 2021), 2021,
  • [46] Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion
    Laxmi Pandey
    Rajesh M. Hegde
    Circuits, Systems, and Signal Processing, 2019, 38 : 2767 - 2791
  • [47] A depthwise separable convolutional neural network for keyword spotting on an embedded system
    Peter Mølgaard Sørensen
    Bastian Epp
    Tobias May
    EURASIP Journal on Audio, Speech, and Music Processing, 2020
  • [48] A depthwise separable convolutional neural network for keyword spotting on an embedded system
    Sorensen, Peter Molgaard
    Epp, Bastian
    May, Tobias
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
  • [49] Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks
    Cerutti, Gianmarco
    Cavigelli, Lukas
    Andri, Renzo
    Magno, Michele
    Farella, Elisabetta
    Benini, Luca
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (05) : 2002 - 2012
  • [50] METRIC LEARNING FOR KEYWORD SPOTTING
    Huh, Jaesung
    Lee, Minjae
    Heo, Heesoo
    Mun, Seongkyu
    Chung, Joon Son
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 133 - 140