Deep Spoken Keyword Spotting: An Overview

被引：47

作者：

Lopez-Espejo, Ivan ^{[1
]}

Tan, Zheng-Hua ^{[1
]}

Hansen, John H. L. ^{[2
]}

Jensen, Jesper ^{[1
,3
]}

机构：

[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark

[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA

[3] Oticon AS, DK-2765 Smorum, Denmark

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;

D O I：

10.1109/ACCESS.2021.3139508

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

引用

页码：4169 / 4199

页数：31

共 50 条

[21] Acoustic Similarity Scores for Keyword Spotting
Veiga, Arlindo
Lopes, Carla
Sa, Luis
Perdigao, Fernando
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 48 - 58
[22] An approach of keyword spotting based on HMM
Yan, BF
Guo, R
Zhu, XY
Zhang, B
PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 2757 - 2759
[23] DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
Tang, Raphael
Lin, Jimmy
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5484 - 5488
[24] Keyword Transformer: A Self-Attention Model for Keyword Spotting
Berg, Axel
O'Connor, Mark
Cruz, Miguel Tairum
INTERSPEECH 2021, 2021, : 4249 - 4253
[25] On-the-Fly Deformations for Keyword Spotting
Retsinas, George
Sfikas, Giorgos
Gatos, Basilis
Nikou, Christophoros
DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 338 - 351
[26] Streaming keyword spotting on mobile devices
Rybakov, Oleg
Kononenko, Natasha
Subrahmanya, Niranjan
Visontai, Mirko
Laurenzo, Stella
INTERSPEECH 2020, 2020, : 2277 - 2281
[27] BONSAPPS KEYWORD SPOTTING CHALLENGE : ENVIRONMENT AWARE UNIVERSAL KEYWORD ENCODER FOR LOW FOOTPRINT DEVICES
Hafsati, Mohammed
Bentounes, Kamil
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[28] Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN
Zheng, Peijia
Cai, Zhiwei
Zeng, Huicong
Huang, Jiwu
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1474 - 1483
[29] Metadata-Aware End-to-End Keyword Spotting
Liu, Hongyi
Abhyankar, Apurva
Mishchenko, Yuriy
Senechal, Thibaud
Fu, Gengshen
Kulis, Brian
Stein, Noah
Shah, Anish
Vitaladevuni, Shiv Naga Prasad
INTERSPEECH 2020, 2020, : 2282 - 2286
[30] SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK
Chen, Xi
Yin, Shouyi
Song, Dandan
Ouyang, Peng
Liu, Leibo
Wei, Shaojun
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 539 - 546

← 1 2 3 4 5 →