A survey of sound source localization with deep learning methods

被引:187
作者
Grumiaux, Pierre-Amaury [1 ]
Kitic, Srdan [2 ]
Girin, Laurent [3 ]
Guerin, Alexandre [2 ]
机构
[1] Nantes Univ, LS2N, CNRS, Ecole Cent Nantes, 2 Chemin Houssiniere, F-44332 Nantes, France
[2] Orange Labs, 4 Rue Clos Courtel, F-35510 Cesson Sevigne, France
[3] Univ Grenoble Alpes, GIPSA Lab, Grenoble INP, 11 Rue Math, F-38400 St Martin Dheres, France
关键词
ACOUSTIC SOURCE LOCALIZATION; CONVOLUTIONAL NEURAL-NETWORKS; OF-ARRIVAL ESTIMATION; SPEECH SOURCE LOCALIZATION; RELATIVE TRANSFER-FUNCTION; AUDIO SOURCE SEPARATION; DOA ESTIMATION; SPEAKER LOCALIZATION; EVENT LOCALIZATION; DATA AUGMENTATION;
D O I
10.1121/10.0011809
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article is a survey of deep learning methods for single and multiple sound source localization, with a focus on sound source localization in indoor environments, where reverberation and diffuse noise are present. We provide an extensive topography of the neural network-based sound source localization literature in this context, organized according to the neural network architecture, the type of input features, the output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy. Tables summarizing the literature survey are provided at the end of the paper, allowing a quick search of methods with a given set of target characteristics. (C) 2022 Acoustical Society of America
引用
收藏
页码:107 / 151
页数:45
相关论文
共 388 条
[1]  
Adavanne S., 2019, P DET CLASS AC SCEN
[2]  
Adavanne S., 2019, P DET CLASS AC SCEN, P20, DOI [10.33682/xb0q-a335, DOI 10.33682/XB0Q-A335]
[3]   DIFFERENTIABLE TRACKING-BASED TRAINING OF DEEP LEARNING SOUND SOURCE LOCALIZERS [J].
Adavanne, Sharath ;
Politis, Archontis ;
Virtanen, Tuomas .
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, :211-215
[4]   Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J].
Adavanne, Sharath ;
Politis, Archontis ;
Nikunen, Joonas ;
Virtanen, Tuomas .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :34-48
[5]  
Adavanne S, 2018, EUR SIGNAL PR CONF, P1462, DOI 10.23919/EUSIPCO.2018.8553182
[6]  
Ahmad M., 2021, P IEEE INT C DIGITAL
[7]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[8]   Speaker Diarization: A Review of Recent Research [J].
Anguera Miro, Xavier ;
Bozonnet, Simon ;
Evans, Nicholas ;
Fredouille, Corinne ;
Friedland, Gerald ;
Vinyals, Oriol .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :356-370
[9]  
[Anonymous], 2016, IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
[10]  
[Anonymous], 2016, NIPS, DOI [DOI 10.1145/3065386, DOI 10.2165/00129785-200404040-00005]