Sound source localization based on SRP-PHAT spatial spectrum and deep neural network

被引:0
作者
Zhao X. [1 ]
Chen S. [2 ]
Zhou L. [3 ]
Chen Y. [3 ,4 ]
机构
[1] School of Information and Communication Engineering, Nanjing Institute of Technology, Nanjing
[2] School of Mathematics and Information Technology, Jiangsu Second Normal University, Nanjing
[3] School of Information Science and Engineering, Southeast University, Nanjing
[4] Department of Psychiatry, Columbia University and NYSPI, New York
基金
中国国家自然科学基金;
关键词
Deep neural network; Microphone array; Sound source localization; Steered response power-phase transform (SRP-PHAT) spatial spectrum;
D O I
10.32604/CMC.2020.09848
中图分类号
学科分类号
摘要
Microphone array-based sound source localization (SSL) is a challenging task in adverse acoustic scenarios. To address this, a novel SSL algorithm based on deep neural network (DNN) using steered response power-phase transform (SRP-PHAT) spatial spectrum as input feature is presented in this paper. Since the SRP-PHAT spatial power spectrum contains spatial location information, it is adopted as the input feature for sound source localization. DNN is exploited to extract the efficient location information from SRP-PHAT spatial power spectrum due to its advantage on extracting high-level features. SRP-PHAT at each steering position within a frame is arranged into a vector, which is treated as DNN input. A DNN model which can map the SRP-PHAT spatial spectrum to the azimuth of sound source is learned from the training signals. The azimuth of sound source is estimated through trained DNN model from the testing signals. Experiment results demonstrate that the proposed algorithm significantly improves localization performance whether the training and testing condition setup are the same or not, and is more robust to noise and reverberation. © 2020 Tech Science Press. All rights reserved.
引用
收藏
页码:253 / 271
页数:18
相关论文
共 30 条
[1]  
Adavanne S., Politis A., Virtanen T., Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network, 26th European Signal Processing Conference, pp. 1462-1466, (2018)
[2]  
Allen J. B., Berkley D. A., Image method for efficiently simulating small-room acoustics, Journal of Acoustical Society of America, 65, 4, pp. 943-950, (1979)
[3]  
Chakrabarty S., Habets E. A. P., Broadband DOA estimation using convolutional neural networks trained with noise signals, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 136-140, (2017)
[4]  
Dibiase J. H., A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays, (2001)
[5]  
Hinton G. E., Training products of experts by minimizing contrastive divergence, Neural Computation, 14, 8, pp. 1771-1800, (2002)
[6]  
Hinton G. E., A practical guide to training restricted Boltzmann machines, Momentum, 9, 1, (2010)
[7]  
Kim S. M., Kim H. H., Direction-of-arrival based SNR estimation for dual-microphone speech enhancement, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 12, pp. 2207-2217, (2014)
[8]  
Knapp C., Carter G., The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 4, pp. 320-327, (1976)
[9]  
Ma N., Gonzalez J. A., Brown G. J., Robust binaural localization of a target sound source by combining spectral source models and deep neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 11, pp. 2122-2131, (2018)
[10]  
Ma N., May T., Brown G. J., Exploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25, 12, pp. 2444-2453, (2017)