DIFFERENTIABLE TRACKING-BASED TRAINING OF DEEP LEARNING SOUND SOURCE LOCALIZERS

被引:15
作者
Adavanne, Sharath [1 ]
Politis, Archontis [1 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ, Audio Res Grp, Tampere, Finland
来源
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) | 2021年
关键词
sound source localization; deep-learning acoustic processing; multi-target tracking; CNN;
D O I
10.1109/WASPAA52581.2021.9632773
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem. Regression-based approaches have certain advantages over classification-based, such as continuous direction-of-arrival estimation of static and moving sources. However, multi-source scenarios require multiple regressors without a clear training strategy up-to-date, that does not rely on auxiliary information such as simultaneous sound classification. We investigate end-to-end training of such methods with a technique recently proposed for video object detectors, adapted to the SSL setting. A differentiable network is constructed that can be plugged to the output of the localizer to solve the optimal assignment between predictions and references, optimizing directly the popular CLEAR-MOT tracking metrics. Results indicate large improvements over directly optimizing mean squared errors, in terms of localization error, detection metrics, and tracking capabilities.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 28 条
[1]  
Adavanne S., 2019, DETECTION CLASSIFICA
[2]  
Adavanne S., 2020, THESIS TAMPERE U
[3]   Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J].
Adavanne, Sharath ;
Politis, Archontis ;
Nikunen, Joonas ;
Virtanen, Tuomas .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :34-48
[4]  
Adavanne S, 2018, EUR SIGNAL PR CONF, P1462, DOI 10.23919/EUSIPCO.2018.8553182
[5]   Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics [J].
Bernardin, Keni ;
Stiefelhagen, Rainer .
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
[6]  
Bianco M. J, 2020, J ACOUST SOC AM, V148, P2662
[7]  
Brandstein M., 2001, MICROPHONE ARRAYS SI
[8]  
Cao Y., 2020, DETECTION CLASSIFICA
[9]   Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals [J].
Chakrabarty, Soumitro ;
Habets, Emanuel A. P. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :8-21
[10]   Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks [J].
Diaz-Guerra, David ;
Miguel, Antonio ;
Beltran, Jose R. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :300-311