EFFICIENT TARGET ACTIVITY DETECTION BASED ON RECURRENT NEURAL NETWORKS

被引:0
作者
Gerber, Daniel [1 ]
Meier, Stefan [1 ]
Kellermann, Walter [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nuremberg FAU, Multimedia Commun & Signal Proc, Cauerstr 7, D-91058 Erlangen, Germany
来源
2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017) | 2017年
关键词
voice activity detection; target activity detection; recurrent neural networks; binaural listening devices; NOISE-REDUCTION; SPEECH;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper addresses the problem of Target Activity Detection (TAD) for binaural listening devices. TAD denotes the problem of robustly detecting the activity of a target speaker in a harsh acoustic environment, which comprises interfering speakers and noise ('cocktail party scenario'). In previous work, it has been shown that employing a Feed-forward Neural Network (FNN) for detecting the target speaker activity is a promising approach to combine the advantage of different TAD features (used as network inputs). In this contribution, we exploit a larger context window for TAD and compare the performance of FNNs and Recurrent Neural Networks (RNNs) with an explicit focus on small network topologies as desirable for embedded acoustic signal processing systems. More specifically, the investigations include a comparison between three different types of RNNs, namely plain RNNs, Long Short-Term Memories, and Gated Recurrent Units. The results indicate that all versions of RNNs outperform FNNs for the task of TAD.
引用
收藏
页码:46 / 50
页数:5
相关论文
共 35 条
[1]  
[Anonymous], 2008, 2008 IEEE Hot Chips 20 Symposium (HCS), DOI 10.1109/HOTCHIPS.2008.7476516
[2]  
Barfuss H., 2015, PROC IEEE WORKSHOP A, P1
[3]  
Battenberg E., 2015, LASAGNE 1 RELEASE
[4]  
Bishop CM, 1995, Neural Networks for Pattern Recognition
[5]  
Cho K., 2014, ARXIV140610782014 AR
[6]   Robust talker direction estimation based on weighted CSP analysis and maximum likelihood estimation [J].
Denda, Y ;
Nishiura, T ;
Yamashita, Y .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) :1050-1057
[7]  
Denda Y., 2007, P INTERSPEECH, P222
[8]  
Elko G. W., 1995, 1995 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (Cat. No.95TH8144), P169, DOI 10.1109/ASPAA.1995.482983
[9]   Signal enhancement using beamforming and nonstationarity with applications to speech [J].
Gannot, S ;
Burshtein, D ;
Weinstein, E .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2001, 49 (08) :1614-1626
[10]  
Graf Simon, 2014, 2014 ITG Fachbericht 252 Speech Communication. 11. ITG-Fachtagung Sprachkommunikation, P1