Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks

被引:47
|
作者
Pak, Junhyeong [1 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
关键词
Sound source localization; direction-of-arrival; deep neural networks; interchannel phase difference; MULTICHANNEL SPEECH ENHANCEMENT; OF-ARRIVAL ESTIMATION; SOURCE SEPARATION; ZERO-CROSSINGS; MIXTURE;
D O I
10.1109/TASLP.2019.2919378
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The performance of most of the classical sound source localization algorithms degrades seriously in the presence of background noise or reverberation. Recently, deep neural networks (DNNs) have successfully been applied to sound source localization, which mainly aim to classify the direction-of-arrival (DoA) into one of the candidate sectors. In this paper, we propose a DNN-based phase difference enhancement for DoA estimation, which turned out to be better than the direct estimation of the DoAs from the input interchannel phase differences (IPDs). The sinusoidal functions of the phase differences for "clean and dry" source signals are estimated from the sinusoidal functions of the IPDs for the input signals, which may include directional signals, diffuse noise, and reverberation. The resulted DoA is further refined to compensate for the estimation bias near the end-fire directions. From the enhanced IPDs, we can determine the DoA for each frequency bin and the DoAs for the current frame from the distributions of the DoAs for frequencies. Experimental results with various types and levels of background noise, reverberation times, numbers of sources, room impulse responses, and DoAs showed that the proposed method outperformed conventional approaches.
引用
收藏
页码:1335 / 1345
页数:11
相关论文
共 50 条
  • [1] A Binaural Sound Localization System using Deep Convolutional Neural Networks
    Xu, Ying
    Afshar, Saeed
    Singh, Ram Kuber
    Wang, Runchun
    van Schaik, Andre
    Hamilton, Tara Julia
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [2] SOUND SOURCE LOCALIZATION BASED ON DEEP NEURAL NETWORKS WITH DIRECTIONAL ACTIVATE FUNCTION EXPLOITING PHASE INFORMATION
    Takeda, Ryu
    Komatani, Kazunori
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 405 - 409
  • [3] Phase retrieval based on difference map and deep neural networks
    Li, Baopeng
    Ersoy, Okan K.
    Ma, Caiwen
    Pan, Zhibin
    Wen, Wansha
    Song, Zongxi
    Gao, Wei
    JOURNAL OF MODERN OPTICS, 2021, 68 (20) : 1108 - 1120
  • [4] DISCRIMINATIVE MULTIPLE SOUND SOURCE LOCALIZATION BASED ON DEEP NEURAL NETWORKS USING INDEPENDENT LOCATION MODEL
    Takeda, Ryu
    Komatani, Kazunori
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 603 - 609
  • [5] UNSUPERVISED ADAPTATION OF DEEP NEURAL NETWORKS FOR SOUND SOURCE LOCALIZATION USING ENTROPY MINIMIZATION
    Takeda, Ryu
    Komatani, Kazunori
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2217 - 2221
  • [6] Sound source localization for auditory perception of a humanoid robot using deep neural networks
    G. Boztas
    Neural Computing and Applications, 2023, 35 : 6801 - 6811
  • [7] Sound source localization for auditory perception of a humanoid robot using deep neural networks
    Boztas, G.
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (09): : 6801 - 6811
  • [8] Phase-Aware Speech Enhancement Based on Deep Neural Networks
    Zheng, Naijun
    Zhang, Xiao-Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 63 - 76
  • [9] Speech Enhancement With Deep Neural Networks Using MoG Based Labels
    Hammer, Hodaya
    Rath, Gilad
    Chazan, Shlomo E.
    Goldberger, Jacob
    Gannot, Sharon
    2018 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING IN ISRAEL (ICSEE), 2018,
  • [10] Sound localization using acoustic images on the phase difference spectrum
    Shimoyama, R
    Yamazaki, K
    SYSTEM SIMULATION AND SCIENTIFIC COMPUTING (SHANGHAI), VOLS I AND II, 2002, : 290 - 294