Deep Spoken Keyword Spotting: An Overview

被引:47
|
作者
Lopez-Espejo, Ivan [1 ]
Tan, Zheng-Hua [1 ]
Hansen, John H. L. [2 ]
Jensen, Jesper [1 ,3 ]
机构
[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
[3] Oticon AS, DK-2765 Smorum, Denmark
关键词
Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;
D O I
10.1109/ACCESS.2021.3139508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.
引用
收藏
页码:4169 / 4199
页数:31
相关论文
共 50 条
  • [1] A survey on structured discriminative spoken keyword spotting
    Tabibian, Shima
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (04) : 2483 - 2520
  • [2] Development and Optimization of an Ultra-lightweight Deep Spoken Keyword Spotting Model for FPGA Acceleration
    Dembeck, Trysten
    Parikh, Chirag
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, CAINE 2024, 2025, 2242 : 3 - 20
  • [3] A novel keyword rescoring method for improved spoken keyword spotting
    Rebai, Ilyes
    BenAyed, Yassine
    Mahdi, Walid
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 312 - 320
  • [4] PROGRESSIVE CONTINUAL LEARNING FOR SPOKEN KEYWORD SPOTTING
    Huang, Yizheng
    Hou, Nana
    Chen, Nancy F.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7552 - 7556
  • [5] A survey on structured discriminative spoken keyword spotting
    Shima Tabibian
    Artificial Intelligence Review, 2020, 53 : 2483 - 2520
  • [6] Multitaper-Mel Spectrograms for Keyword Spotting
    de Souza, Douglas Baptista
    Bakri, Khaled Jamal
    Ferreira, Fernanda de Souza
    Inacio, Juliana
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2028 - 2032
  • [7] DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
    Wu, Zong-En
    Chan, Shao-Jung
    Wubet, Yeshanew Ale
    Lian, Kuang-Yow
    IEEE ACCESS, 2025, 13 : 23498 - 23507
  • [8] Deep Convolutional Spiking Neural Networks for Keyword Spotting
    Yilmaz, Emre
    Gevrek, Ozgur Bora
    Wu, Jibin
    Chen, Yuxiang
    Meng, Xuanbo
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 2557 - 2561
  • [9] Spoken Keyword Spotting via Multi-Lattice Alignment
    Lin, Hui
    Stupakov, Alex
    Bilmes, Jeff
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2191 - 2194
  • [10] Deep Learning Features for Handwritten Keyword Spotting
    Wicht, Baptiste
    Fischer, Andreas
    Hennebert, Jean
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3434 - 3439