END-TO-END STREAMING KEYWORD SPOTTING

被引:0
作者
Alvarez, Raziel [1 ]
Park, Hyun-Jin [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
deep neural networks; keyword spotting; audio processing; embedded speech recognition; FEATURES;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a system for keyword spotting that, except for a front-end component for feature generation, it is entirely contained in a deep neural network (DNN) model trained "end-to-end" to predict the presence of the keyword in a stream of audio. The main contributions of this work are, first, an efficient memoized neural network topology that aims at making better use of the parameters and associated computations in the DNN by holding a memory of previous activations distributed over the depth of the DNN. The second contribution is a method to train the DNN, end-to-end, to produce the keyword spotting score. This system significantly outperforms previous approaches both in terms of quality of detection as well as size and computation.
引用
收藏
页码:6336 / 6340
页数:5
相关论文
共 21 条
  • [1] [Anonymous], 2012, P INT 2012
  • [2] [Anonymous], INTRO MODEL OPTIMIZA
  • [3] Chen G, 2014, CHIN CONTR CONF, P1087, DOI 10.1109/ChiCC.2014.6896779
  • [4] Optimizing bottle-neck features for LVCSR
    Grezl, Frantisek
    Fousek, Petr
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4729 - +
  • [5] Gruenstein A., 2017, 31 C NEUR INF PROC S
  • [6] Guo JX, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5489, DOI 10.1109/ICASSP.2018.8462166
  • [7] He YZ, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P474, DOI 10.1109/ASRU.2017.8268974
  • [8] Kumatani K, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P252, DOI 10.1109/ASRU.2017.8268943
  • [9] Nakkiran P, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1473
  • [10] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
    Panchapagesan, Sankaran
    Sun, Ming
    Khare, Aparna
    Mandal, Spyros Matsoukas Arindam
    Hoffineister, Bjorn
    Vitaladevuni, Shiv
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764