Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks

被引:2
|
作者
Chun, Chanjun [1 ]
Jeon, Kwang Myung [2 ]
Choi, Wooyeol [3 ]
机构
[1] Korea Inst Civil Engn & Bldg Technol KICT, Future Infrastruct Res Ctr, Goyang 10223, South Korea
[2] IntFlow Co Ltd, Gwangju 61080, South Korea
[3] Chosun Univ, Dept Comp Engn, Gwangju 61452, South Korea
基金
新加坡国家研究基金会;
关键词
azimuth-frequency representation; configuration-invariant; convolutional neural network (CNN); sound localization;
D O I
10.3390/s20133768
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] Multi-Channel Audio Source Separation Using Azimuth-Frequency Analysis and Convolutional Neural Network
    Moon, Jung Min
    Kim, Jun Ho
    Kim, Tae Woo
    Chun, Chan Jun
    Kim, Hong Kook
    2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 500 - 503
  • [2] SOUND SOURCE LOCALIZATION IN A MULTIPATH ENVIRONMENT USING CONVOLUTIONAL NEURAL NETWORKS
    Ferguson, Eric L.
    Williams, Stefan B.
    Jin, Craig T.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2386 - 2390
  • [3] A Binaural Sound Localization System using Deep Convolutional Neural Networks
    Xu, Ying
    Afshar, Saeed
    Singh, Ram Kuber
    Wang, Runchun
    van Schaik, Andre
    Hamilton, Tara Julia
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [4] Sound Classification Using Convolutional Neural Networks
    Jaiswal, Kaustumbh
    Patel, Dhairya Kalpeshbhai
    2018 SEVENTH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING IN EMERGING MARKETS (CCEM), 2018, : 81 - 84
  • [5] Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
    Adavanne, Sharath
    Politis, Archontis
    Nikunen, Joonas
    Virtanen, Tuomas
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) : 34 - 48
  • [6] The contribution of object identity and configuration to scene representation in convolutional neural networks
    Tang, Kevin
    Chin, Matthew
    Chun, Marvin
    Xu, Yaoda
    PLOS ONE, 2022, 17 (06):
  • [7] Sound Event Localization and Detection Using Convolutional Recurrent Neural Networks and Gated Linear Units
    Komatsu, Tatsuya
    Togami, Masahito
    Takahashi, Tsubasa
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 41 - 45
  • [8] Illumination Invariant Face Recognition Using Convolutional Neural Networks
    Ramaiah, N. Pattabhi
    Ijjina, Earnest Paul
    Mohan, C. Krishna
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
  • [9] Rotation invariant face detection using convolutional neural networks
    Tivive, Fok Hing Chi
    Bouzerdoum, Abdesselam
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 260 - 269
  • [10] Seismic Event and Phase Detection Using Time-Frequency Representation and Convolutional Neural Networks
    Dokht, Ramin M. H.
    Kao, Honn
    Visser, Ryan
    Smith, Brindley
    SEISMOLOGICAL RESEARCH LETTERS, 2019, 90 (02) : 481 - 490