Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks

被引：2

作者：

Chun, Chanjun ^{[1
]}

Jeon, Kwang Myung ^{[2
]}

Choi, Wooyeol ^{[3
]}

机构：

[1] Korea Inst Civil Engn & Bldg Technol KICT, Future Infrastruct Res Ctr, Goyang 10223, South Korea

[2] IntFlow Co Ltd, Gwangju 61080, South Korea

[3] Chosun Univ, Dept Comp Engn, Gwangju 61452, South Korea

来源：

SENSORS | 2020年 / 20卷 / 13期

基金：

新加坡国家研究基金会;

关键词：

azimuth-frequency representation; configuration-invariant; convolutional neural network (CNN); sound localization;

D O I：

10.3390/s20133768

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods.

引用

页码：1 / 10

页数：10

共 50 条

[1] Multi-Channel Audio Source Separation Using Azimuth-Frequency Analysis and Convolutional Neural Network
Moon, Jung Min
Kim, Jun Ho
Kim, Tae Woo
Chun, Chan Jun
Kim, Hong Kook
2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 500 - 503
[2] SOUND SOURCE LOCALIZATION IN A MULTIPATH ENVIRONMENT USING CONVOLUTIONAL NEURAL NETWORKS
Ferguson, Eric L.
Williams, Stefan B.
Jin, Craig T.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2386 - 2390
[3] A Binaural Sound Localization System using Deep Convolutional Neural Networks
Xu, Ying
Afshar, Saeed
Singh, Ram Kuber
Wang, Runchun
van Schaik, Andre
Hamilton, Tara Julia
2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
[4] Sound Classification Using Convolutional Neural Networks
Jaiswal, Kaustumbh
Patel, Dhairya Kalpeshbhai
2018 SEVENTH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING IN EMERGING MARKETS (CCEM), 2018, : 81 - 84
[5] Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
Adavanne, Sharath
Politis, Archontis
Nikunen, Joonas
Virtanen, Tuomas
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) : 34 - 48
[6] The contribution of object identity and configuration to scene representation in convolutional neural networks
Tang, Kevin
Chin, Matthew
Chun, Marvin
Xu, Yaoda
PLOS ONE, 2022, 17 (06):
[7] Sound Event Localization and Detection Using Convolutional Recurrent Neural Networks and Gated Linear Units
Komatsu, Tatsuya
Togami, Masahito
Takahashi, Tsubasa
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 41 - 45
[8] Illumination Invariant Face Recognition Using Convolutional Neural Networks
Ramaiah, N. Pattabhi
Ijjina, Earnest Paul
Mohan, C. Krishna
2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
[9] Rotation invariant face detection using convolutional neural networks
Tivive, Fok Hing Chi
Bouzerdoum, Abdesselam
NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 260 - 269
[10] Seismic Event and Phase Detection Using Time-Frequency Representation and Convolutional Neural Networks
Dokht, Ramin M. H.
Kao, Honn
Visser, Ryan
Smith, Brindley
SEISMOLOGICAL RESEARCH LETTERS, 2019, 90 (02) : 481 - 490

← 1 2 3 4 5 →