Deep Spoken Keyword Spotting: An Overview

被引：47

作者：

Lopez-Espejo, Ivan ^{[1
]}

Tan, Zheng-Hua ^{[1
]}

Hansen, John H. L. ^{[2
]}

Jensen, Jesper ^{[1
,3
]}

机构：

[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark

[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA

[3] Oticon AS, DK-2765 Smorum, Denmark

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;

D O I：

10.1109/ACCESS.2021.3139508

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

引用

页码：4169 / 4199

页数：31

共 50 条

[31] Exploring TinyML Frameworks for Small-Footprint Keyword Spotting: A Concise Overview
Garai, Soumen
Samui, Suman
2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
[32] A Hybrid Deep Learning Approach to Keyword Spotting in Vietnamese Stele Images
Scius-Bertrand A.
Bui M.
Fischer A.
Informatica (Slovenia), 2023, 47 (03): : 361 - 372
[33] SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
Chen, Guoguo
Parada, Carolina
Heigold, Georg
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[34] A Lightweight Architecture for Query-by-Example Keyword Spotting on Low-Power IoT Devices
Li, Meirong
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2023, 69 (01) : 65 - 75
[35] A Deep Learning-Based Noise-Resilient Keyword Spotting Engine for Embedded Platforms
Abdelmoula, Ramzi
Khamis, Alaa
Karray, Fakhri
IMAGE ANALYSIS AND RECOGNITION (ICIAR 2019), PT II, 2019, 11663 : 134 - 146
[36] Keyword Spotting for Industrial Control using Deep Learning on Edge Devices
Hoelzke, Fabian
Ahmed, Hameem
Golatowski, Frank
Timmermann, Dirk
2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 167 - 172
[37] Deep Template Matching for Small-footprint and Configurable Keyword Spotting
Zhang, Peng
Zhang, Xueliang
INTERSPEECH 2020, 2020, : 2572 - 2576
[38] Beam Pruning Based on Quantile for Keyword Spotting
Li, Haiyang
Yuan, Hao
Han, Jiqing
Zheng, Tieran
2012 INTERNATIONAL WORKSHOP ON INFORMATION AND ELECTRONICS ENGINEERING, 2012, 29 : 2985 - 2989
[39] Comparison of Keyword Spotting Methods for Searching in Speech
Smidl, Lubos
Psutka, Josef V.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1894 - 1897
[40] SUPERVISED NOISE REDUCTION FOR MULTICHANNEL KEYWORD SPOTTING
Huang, Yiteng
Hughes, Thad
Shabestary, Turaj Z.
Applebaum, Taylor
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5474 - 5478

← 1 2 3 4 5 →