Deep Spoken Keyword Spotting: An Overview

被引：63

作者：

Lopez-Espejo, Ivan ^{[1
]}

Tan, Zheng-Hua ^{[1
]}

Hansen, John H. L. ^{[2
]}

Jensen, Jesper ^{[1
,3
]}

机构：

[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark

[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA

[3] Oticon AS, DK-2765 Smorum, Denmark

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Hidden Markov models; Acoustics; Feature extraction; Decoding; Computational modeling; Viterbi algorithm; Virtual assistants; Keyword spotting; deep learning; acoustic model; small footprint; robustness; ADAPTIVE NOISE CANCELLATION; SMALL-FOOTPRINT; SPEECH RECOGNITION; SPEAKER VERIFICATION; TERM DETECTION; ROBUST; REPRESENTATIONS; ENHANCEMENT; CHALLENGE; ATTENTION;

D O I：

10.1109/ACCESS.2021.3139508

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

引用

页码：4169 / 4199

页数：31

共 260 条

[1]

Albert ET, 2019, INT C INTELL COMP CO, P53, DOI [10.1109/ICCP48234.2019.8959645, 10.1109/iccp48234.2019.8959645]

[2]

Alvarez R, 2019, INT CONF ACOUST SPEE, P6336, DOI 10.1109/ICASSP.2019.8683557

[3]

An S, 2019, INTERSPEECH, P3661

[4]

[Anonymous], 1990, Neurocomputing: Algorithms, architectures and applications, DOI DOI 10.1007/978-3-642-76153-928

[5]

[Anonymous], 1993, NASA STI/Recon Technical Report N

[6]

[Anonymous], 2021, KEYWORD RECOGNITION

[7]

[Anonymous], 2001, IEEE Data Eng. Bull.

[8]

[Anonymous], 2013, OpenKWS13 Keyword Search Evaluation Plan

[9]

[Anonymous], 2006, P 23 INT C MACH LEAR, DOI 10.1145/1143844.1143891

[10] Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting [J].

Arik, Sercan O. ;

Kliegl, Markus ;

Child, Rewon ;

Hestness, Joel ;

Gibiansky, Andrew ;

Fougner, Chris ;

Prenger, Ryan ;

Coates, Adam .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1606-1610

← 1 2 3 4 5 6 7 8 9 10 →