DBNET: DOA-DRIVEN BEAMFORMING NETWORK FOR END-TO-END REVERBERANT SOUND SOURCE SEPARATION

被引:19
作者
Aroudi, Ali [1 ,2 ]
Braun, Sebastian [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
[2] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, Oldenburg, Germany
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
sound source separation; deep learning; beamforming; direction of arrival estimation;
D O I
10.1109/ICASSP39728.2021.9414187
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many deep learning techniques are available to perform source separation and reduce background noise. However, designing an end-to-end multi-channel source separation method using deep learning and conventional acoustic signal processing techniques still remains challenging. In this paper we propose a direction-of-arrival-driven beamforming network (DBnet) consisting of direction-of-arrival (DOA) estimation and beamforming layers for end-to-end source separation. We propose to train DBnet using loss functions that are solely based on the distances between the separated speech signals and the target speech signals, without a need for the ground-truth DOAs of speakers. To improve the source separation performance, we also propose end-to-end extensions of DBnet which incorporate post masking networks. We evaluate the proposed DBnet and its extensions on a very challenging dataset, targeting realistic far-field sound source separation in reverberant and noisy environments. The experimental results show that the proposed extended DBnet using a convolutional-recurrent post masking network outperforms state-of-the-art source separation methods.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 25 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
[Anonymous], 2015, ACS SYM SER
[3]  
Brandstein M., 2001, MICROPHONE ARRAYS SI
[4]  
Braun Sebastian, 2020, A consolidated view of loss functions for supervised deep learning-based speech enhancement
[5]   Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals [J].
Chakrabarty, Soumitro ;
Habets, Emanuel A. P. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :8-21
[6]  
Chen Z, 2017, INT CONF ACOUST SPEE, P246, DOI 10.1109/ICASSP.2017.7952155
[7]   Integration of Neural Networks and Probabilistic Spatial Models for Acoustic Blind Source Separation [J].
Drude, Lukas ;
Haeb-Umbach, Reinhold .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) :815-826
[8]  
Hochreiter S., 1997, Neural Computation, V9, P1735
[9]  
Kinoshita K, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5064, DOI 10.1109/ICASSP.2018.8462646
[10]   Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks [J].
Kolbaek, Morten ;
Yu, Dong ;
Tan, Zheng-Hua ;
Jensen, Jesper .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) :1901-1913