END-TO-END STREAMING KEYWORD SPOTTING

被引：0

作者：

Alvarez, Raziel ^{[1
]}

Park, Hyun-Jin ^{[1
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

deep neural networks; keyword spotting; audio processing; embedded speech recognition; FEATURES;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a system for keyword spotting that, except for a front-end component for feature generation, it is entirely contained in a deep neural network (DNN) model trained "end-to-end" to predict the presence of the keyword in a stream of audio. The main contributions of this work are, first, an efficient memoized neural network topology that aims at making better use of the parameters and associated computations in the DNN by holding a memory of previous activations distributed over the depth of the DNN. The second contribution is a method to train the DNN, end-to-end, to produce the keyword spotting score. This system significantly outperforms previous approaches both in terms of quality of detection as well as size and computation.

引用

页码：6336 / 6340

页数：5

共 21 条

[1] [Anonymous], 2012, P INT 2012
[2] [Anonymous], INTRO MODEL OPTIMIZA
[3] Chen G, 2014, CHIN CONTR CONF, P1087, DOI 10.1109/ChiCC.2014.6896779
[4] Optimizing bottle-neck features for LVCSR
Grezl, Frantisek
Fousek, Petr
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4729 - +
[5] Gruenstein A., 2017, 31 C NEUR INF PROC S
[6] Guo JX, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5489, DOI 10.1109/ICASSP.2018.8462166
[7] He YZ, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P474, DOI 10.1109/ASRU.2017.8268974
[8] Kumatani K, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P252, DOI 10.1109/ASRU.2017.8268943
[9] Nakkiran P, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1473
[10] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
Panchapagesan, Sankaran
Sun, Ming
Khare, Aparna
Mandal, Spyros Matsoukas Arindam
Hoffineister, Bjorn
Vitaladevuni, Shiv
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764

← 1 2 3 →