SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation

被引:20
作者
Tan, Ke [1 ]
Xu, Buye [2 ]
Kumar, Anurag [2 ]
Nachmani, Eliya [3 ,4 ]
Adi, Yossi [3 ,4 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Facebook Real Labs, Redmond, WA 98052 USA
[3] Facebook AI Res, IL-6701203 Tel Aviv, Israel
[4] Tel Aviv Univ, IL-6997801 Tel Aviv, Israel
关键词
Binaural speaker separation; self-attention; interaural cue preservation; time-domain; SPEECH SEPARATION; MODEL; TIME;
D O I
10.1109/LSP.2020.3043977
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated binaural signals. Specifically, we extend a newly-developed gated recurrent neural network for monaural separation by additionally incorporating self-attention mechanisms and dense connectivity. We develop an end-to-end multiple-input multiple-output system, which directly maps from the binaural waveform of the mixture to those of the speech signals. The experimental results show that our proposed approach achieves significantly better separation performance than a recent binaural separation approach. In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.
引用
收藏
页码:26 / 30
页数:5
相关论文
共 43 条
[1]   The CIPICHRTF database [J].
Algazi, VR ;
Duda, RO ;
Thompson, DM ;
Avendano, C .
PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :99-102
[2]  
[Anonymous], 2008, ROBINHOOD76 SOUNDS
[3]  
[Anonymous], 2020, BEAMFORMERS
[4]  
Chen Z, 2017, INT CONF ACOUST SPEE, P246, DOI 10.1109/ICASSP.2017.7952155
[6]   Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target [J].
Dadvar, Paria ;
Geravanchizadeh, Masoud .
SPEECH COMMUNICATION, 2019, 108 :41-52
[7]   LATERAL POSITION AND INTERAURAL DISCRIMINATION [J].
DOMNITZ, RH ;
COLBURN, HS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 61 (06) :1586-1598
[8]  
Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
[9]  
Garofolo John, 1993, Web Download, P83
[10]   Theoretical Analysis of Binaural Transfer Function MVDR Beamformers with Interference Cue Preservation Constraints [J].
Hadad, Elior ;
Marquardt, Daniel ;
Doclo, Simon ;
Gannot, Sharon .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2449-2464