AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS

被引:0
作者
Prabhavalkar, Rohit [1 ]
Alvarez, Raziel [1 ]
Parada, Carolina [1 ]
Nakkiran, Preetum [2 ]
Sainath, Tara N. [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
[2] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
keyword spotting; automatic gain control; multi-style training; small-footprint models; SPEECH RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We explore techniques to improve the robustness of small-footprint keyword spotting models based on deep neural networks (DNNs) in the presence of background noise and in far-field conditions. We find that system performance can be improved significantly, with relative improvements up to 75% in far-field conditions, by employing a combination of multi-style training and a proposed novel formulation of automatic gain control (AGC) that estimates the levels of both speech and background noise. Further, we find that these techniques allow us to achieve competitive performance, even when applied to DNNs with an order of magnitude fewer parameters than our baseline.
引用
收藏
页码:4704 / 4708
页数:5
相关论文
共 19 条
[1]  
[Anonymous], P IEEE INT C AC SPEE
[2]  
[Anonymous], 2007, P ACM SIGIR C
[3]  
[Anonymous], P INT C LEARN REPR I
[4]  
[Anonymous], 2012, NIPS
[5]  
Archibald F. J., 2008, SOFTWARE IMPLEMENTAT
[6]  
Chu PL, 1996, INT CONF ACOUST SPEE, P929, DOI 10.1109/ICASSP.1996.543274
[7]  
Cui J, 2013, INT CONF ACOUST SPEE, P6753, DOI 10.1109/ICASSP.2013.6638969
[8]  
Guoguo Chen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4087, DOI 10.1109/ICASSP.2014.6854370
[9]   Query-By-Example Spoken Term Detection Using Phonetic Posteriorgram Templates [J].
Hazen, Timothy J. ;
Shen, Wade ;
White, Christopher .
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, :421-+
[10]  
Jaitly N., 2012, P ANN C INT SPEECH C