Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network

被引:58
作者
Nguyen, Thi Ngoc Tho [1 ]
Gan, Woon-Seng [1 ]
Ranjan, Rishabh [1 ]
Jones, Douglas L. [2 ]
机构
[1] Nanyang Technol Univ, Dept Elect & Elect Engn, Singapore 639798, Singapore
[2] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61801 USA
关键词
Direction-of-arrival estimation; Estimation; Two dimensional displays; Reverberation; Speech processing; Robustness; Convolutional neural networks; convolutional neural network; spatial pseudo-spectrum; multi-task learning; multiple sound sources; SOURCE LOCALIZATION; MULTIPLE;
D O I
10.1109/TASLP.2020.3019646
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many signal processing-based methods for sound source direction-of-arrival estimation produce a spatial pseudo-spectrum of which the local maxima strongly indicate the source directions. Due to different levels of noise, reverberation and different number of overlapping sources, the spatial pseudo-spectra are noisy even after smoothing. In addition, the number of sources is often unknown. As a result, selecting the peaks from these spectra is susceptible to error. Convolutional neural network has been successfully applied to many image processing problems in general and direction-of-arrival estimation in particular. In addition, deep learning-based methods for direction-of-arrival estimation show good generalization to different environments. We propose to use a 2D convolutional neural network with multi-task learning to robustly estimate the number of sources and the directions-of-arrival from short-time spatial pseudo-spectra, which have useful directional information from audio input signals. This approach reduces the tendency of the neural network to learn unwanted association between sound classes and directional information, and helps the network generalize to unseen sound classes. The simulation and experimental results show that the proposed methods outperform other directional-of-arrival estimation methods in different levels of noise and reverberation, and different number of sources.
引用
收藏
页码:2626 / 2637
页数:12
相关论文
共 30 条
[1]  
Adavanne S, 2018, EUR SIGNAL PR CONF, P1462, DOI 10.23919/EUSIPCO.2018.8553182
[2]  
[Anonymous], 2020, WEIGHTED CROSS ENTRO
[3]  
[Anonymous], 2016, Deep Learning
[4]  
Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
[5]  
CAO Y, 2019, PROC DETECT CLASSIFI, P30
[6]   HIGH-RESOLUTION FREQUENCY-WAVENUMBER SPECTRUM ANALYSIS [J].
CAPON, J .
PROCEEDINGS OF THE IEEE, 1969, 57 (08) :1408-&
[7]   Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals [J].
Chakrabarty, Soumitro ;
Habets, Emanuel A. P. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :8-21
[8]  
He WP, 2018, IEEE INT CONF ROBOT, P74
[9]   Evaluation of a MUSIC-based Real-time Sound Localization of Multiple Sound Sources in Real Noisy Environments [J].
Ishi, Carlos T. ;
Chatot, Olivier ;
Ishiguro, Hiroshi ;
Hagita, Norihiro .
2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, :2027-2032
[10]   The Hungarian Method for the assignment problem [J].
Kuhn, HW .
NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21