Dynamically localizing multiple speakers based on the time-frequency domain

被引:24
作者
Hammer, Hodaya [1 ]
Chazan, Shlomo E. [1 ]
Goldberger, Jacob [1 ]
Gannot, Sharon [1 ]
机构
[1] Fac Elect Engn, Ramat Gan, Israel
关键词
DOA; UNET; Tracking; SOURCE LOCALIZATION; SPEECH; ENHANCEMENT; SEPARATION; NOISY;
D O I
10.1186/s13636-021-00203-w
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, time-frequency (TF) bin is dominated by a single speaker and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high-resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using simulated and real-life recordings in static and dynamic scenarios demonstrates that the proposed algorithm significantly outperforms both classic and recent deep-learning-based algorithms. Finally, as a byproduct, we further show that the proposed method is also capable of separating moving speakers by the application of the obtained TF masks.
引用
收藏
页数:10
相关论文
共 36 条
[1]  
Adi Y., 2021, IEEE INT C AC SPEECH
[2]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[3]  
[Anonymous], 2007, Optimum Signal Processing
[4]  
[Anonymous], 2015, ACS SYM SER
[5]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[6]  
Brandstein MS, 1997, INT CONF ACOUST SPEE, P375, DOI 10.1109/ICASSP.1997.599651
[7]   Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals [J].
Chakrabarty, Soumitro ;
Habets, Emanuel A. P. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :8-21
[8]  
Chakrabarty S, 2017, IEEE WORK APPL SIG, P136, DOI 10.1109/WASPAA.2017.8170010
[9]   Multi-Microphone Speaker Separation based on Deep DOA Estimation [J].
Chazan, Shlomo E. ;
Hammer, Hodaya ;
Hazan, Gershon ;
Goldberger, Jacob ;
Gannot, Sharon .
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[10]   A new non-linear framework for localization of acoustic sources [J].
Das, Avik Kumar ;
Lai, Tsun Tat ;
Chan, Chi Wai ;
Leung, Christopher K. Y. .
STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2019, 18 (02) :590-601