Deep Spoken Keyword Spotting: An Overview

被引：47

作者：

Lopez-Espejo, Ivan ^{[1
]}

Tan, Zheng-Hua ^{[1
]}

Hansen, John H. L. ^{[2
]}

Jensen, Jesper ^{[1
,3
]}

机构：

[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark

[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA

[3] Oticon AS, DK-2765 Smorum, Denmark

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;

D O I：

10.1109/ACCESS.2021.3139508

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

引用

页码：4169 / 4199

页数：31

共 50 条

[41] Efficient Keyword Spotting through Hardware-Aware Conditional Execution of Deep Neural Networks
Giraldo, J. S. P.
O'Connor, Chris
Verhelst, Marian
2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
[42] Effective Combination of DenseNet and BiLSTM for Keyword Spotting
Zeng, Mengjun
Xiao, Nanfeng
IEEE ACCESS, 2019, 7 : 10767 - 10775
[43] Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware
Blouw, Peter
Choo, Xuan
Hunsberger, Eric
Eliasmith, Chris
PROCEEDINGS OF THE 2019 7TH ANNUAL NEURO-INSPIRED COMPUTATIONAL ELEMENTS WORKSHOP (NICE 2019), 2020,
[44] Generalized Keyword Spotting using ASR embeddings
Kirandevraj, R.
Kurmi, Vinod K.
Namboodiri, Vinay P.
Jawahar, C. V.
INTERSPEECH 2022, 2022, : 126 - 130
[45] Reduced Model Size Deep Convolutional Neural Networks for Small-Footprint Keyword Spotting
Tsai, Tsung Han
Lin, Xin Hui
2021 28TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (IEEE ICECS 2021), 2021,
[46] Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion
Laxmi Pandey
Rajesh M. Hegde
Circuits, Systems, and Signal Processing, 2019, 38 : 2767 - 2791
[47] A depthwise separable convolutional neural network for keyword spotting on an embedded system
Peter Mølgaard Sørensen
Bastian Epp
Tobias May
EURASIP Journal on Audio, Speech, and Music Processing, 2020
[48] A depthwise separable convolutional neural network for keyword spotting on an embedded system
Sorensen, Peter Molgaard
Epp, Bastian
May, Tobias
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
[49] Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks
Cerutti, Gianmarco
Cavigelli, Lukas
Andri, Renzo
Magno, Michele
Farella, Elisabetta
Benini, Luca
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (05) : 2002 - 2012
[50] METRIC LEARNING FOR KEYWORD SPOTTING
Huh, Jaesung
Lee, Minjae
Heo, Heesoo
Mun, Seongkyu
Chung, Joon Son
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 133 - 140

← 1 2 3 4 5 →