Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target

被引:15
作者
Dadvar, Paria [1 ]
Geravanchizadeh, Masoud [1 ]
机构
[1] Univ Tabriz, Fac Elect & Comp Engn, Tabriz 5166615813, Iran
关键词
Binaural speech separation; Deep neural network; Soft missing data masking; Ideal ratio mask; Intelligibility improvement; Quality improvement; CLASSIFICATION; NOISE; ALGORITHM; INTELLIGIBILITY; RECOGNITION; SEGREGATION;
D O I
10.1016/j.specom.2019.02.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, a robust binaural speech separation system based on deep neural network (DNN) is introduced. The proposed system has three main processing stages. In the spectral processing stage, the multiresolution cochlea gram (MRCG) feature is extracted from the beamformed signal. In the spatial processing stage, a novel reliable spatial feature of smITD + smILD is obtained by soft missing data masking of binaural cues. In the final stage, a deep neural network takes the combined spectral and spatial features and estimates a newly defined ideal ratio mask (IRM) designed for noisy and reverberant conditions. The performance of the proposed system is evaluated and compared with two recent binaural speech separation systems as baselines in various noisy and reverberant conditions. Furthermore, the performance of each processing stage is explored and compared to those of state-of-the-art approaches. A multitalker spatially diffuse babble is used as interferer at four signal-to-noise ratios (SNRs). Simulated rooms with four matched and four unmatched reverberation times (RTs) are considered in the experiments. It is shown that the proposed system outperforms the baseline systems in improving the intelligibility and quality of separated speech signals in reverberant and noisy conditions. The results confirm the efficiency of each system component, especially in highly reverberant scenarios.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 42 条
[1]  
[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
[2]  
[Anonymous], P INTERSPEECH
[3]   SMOOTHED COHERENCE TRANSFORM [J].
CARTER, GC ;
NUTTALL, AH ;
CABLE, PG .
PROCEEDINGS OF THE IEEE, 1973, 61 (10) :1497-1498
[4]   A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios [J].
Chen, Jitong ;
Wang, Yuxuan ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1993-2002
[5]  
Cherry C., 1978, HUMAN COMMUNICATION, V3rd
[6]   Effects of reverberation on perceptual segregation of competing voices [J].
Culling, JF ;
Hodder, KI ;
Toh, CY .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 114 (05) :2871-2876
[7]  
Dalenback BIL, 2011, CATT ACOUSTIC V9 POW
[8]   Features for Masking-Based Monaural Speech Separation in Reverberant Conditions [J].
Delfarah, Masood ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) :1085-1094
[9]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[10]   HRTF MEASUREMENTS OF A KEMAR [J].
GARDNER, WG ;
MARTIN, KD .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1995, 97 (06) :3907-3908