Feature Aggregation in Joint Sound Classification and Localization Neural Networks

被引:1
作者
Healy, Brendan [1 ]
Mcnamee, Patrick [1 ]
Ahmadabadi, Zahra Nili [1 ]
机构
[1] San Diego State Univ, Dept Mech Engn, San Diego, CA 92182 USA
关键词
Joint sound signal classification and localization; multi-task deep learning; feature aggregation; EVENT LOCALIZATION; ARCHITECTURE;
D O I
10.1109/ACCESS.2024.3438947
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current state-of-the-art sound source localization (SSL) deep learning networks lack feature aggregation within their architecture. Feature aggregation within neural network architectures enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. We adapt feature aggregation sub-architectures from computer vision neural networks onto a baseline neural network architecture for SSL, the Sound Event Localization and Detection network (SELDnet). The incorporated sub-architecture are: Path Aggregation Network (PANet); Weighted Bi-directional Feature Pyramid Network (BiFPN); and a novel Scale Encoding Network (SEN). These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. The results show that models incorporating feature aggregations outperformed the baseline SELDnet, in both sound signal classification and localization. Among the feature aggregators, PANet exhibited superior performance compared to other methods, which were otherwise comparable. The results provide evidence that feature aggregation sub-architectures enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.
引用
收藏
页码:109157 / 109170
页数:14
相关论文
共 103 条
[1]  
Adavanne S., 2018, P INT C DIG AUD EFF
[2]   Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J].
Adavanne, Sharath ;
Politis, Archontis ;
Nikunen, Joonas ;
Virtanen, Tuomas .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :34-48
[3]  
Adavanne Sharath, 2018, Zenodo
[4]  
Adavanne S, 2017, INT CONF ACOUST SPEE, P771, DOI 10.1109/ICASSP.2017.7952260
[5]  
Ba J, 2014, ACS SYM SER
[6]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[7]   SEMI-SUPERVISED SOURCE LOCALIZATION WITH DEEP GENERATIVE MODELING [J].
Bianco, Michael J. ;
Gannot, Sharon ;
Gerstoft, Peter .
PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
[8]   Semi-Supervised Source Localization in Reverberant Environments With Deep Generative Modeling [J].
Bianco, Michael J. ;
Gannot, Sharon ;
Fernandez-Grande, Efren ;
Gerstoft, Peter .
IEEE ACCESS, 2021, 9 :84956-84970
[9]  
Brandstein M. S., 2001, Microphone Arrays: Signal Processing Techniques and Applications, DOI DOI 10.1007/978-3-662-04619-7
[10]  
Cao Y., 2019, arXiv