Feature Aggregation in Joint Sound Classification and Localization Neural Networks

被引:1
作者
Healy, Brendan [1 ]
Mcnamee, Patrick [1 ]
Ahmadabadi, Zahra Nili [1 ]
机构
[1] San Diego State Univ, Dept Mech Engn, San Diego, CA 92182 USA
关键词
Joint sound signal classification and localization; multi-task deep learning; feature aggregation; EVENT LOCALIZATION; ARCHITECTURE;
D O I
10.1109/ACCESS.2024.3438947
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current state-of-the-art sound source localization (SSL) deep learning networks lack feature aggregation within their architecture. Feature aggregation within neural network architectures enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. We adapt feature aggregation sub-architectures from computer vision neural networks onto a baseline neural network architecture for SSL, the Sound Event Localization and Detection network (SELDnet). The incorporated sub-architecture are: Path Aggregation Network (PANet); Weighted Bi-directional Feature Pyramid Network (BiFPN); and a novel Scale Encoding Network (SEN). These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. The results show that models incorporating feature aggregations outperformed the baseline SELDnet, in both sound signal classification and localization. Among the feature aggregators, PANet exhibited superior performance compared to other methods, which were otherwise comparable. The results provide evidence that feature aggregation sub-architectures enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.
引用
收藏
页码:109157 / 109170
页数:14
相关论文
共 103 条
[21]   AN IMPROVED EVENT-INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION [J].
Gao, Yin ;
Iqbal, Turab ;
Kong, Qiuqiang ;
An, Fengyan ;
Wang, Wenwu ;
Plumbley, Mark D. .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :885-889
[22]   NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [J].
Ghiasi, Golnaz ;
Lin, Tsung-Yi ;
Le, Quoc V. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7029-7038
[23]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[24]  
Goodfellow J., 2014, P ADV NEUR INF PROC
[25]  
Grondin F., 2019, technical report., P1
[26]   A survey of sound source localization with deep learning methods [J].
Grumiaux, Pierre-Amaury ;
Kitic, Srdan ;
Girin, Laurent ;
Guerin, Alexandre .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 152 (01) :107-151
[27]   Dynamically localizing multiple speakers based on the time-frequency domain [J].
Hammer, Hodaya ;
Chazan, Shlomo E. ;
Goldberger, Jacob ;
Gannot, Sharon .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[28]  
Han Y., 2020, P 28 ACM INT C MULT
[29]  
Hao YY, 2020, IEEE ACCESS, V8, P197047, DOI [10.1109/ACCESS.2020.3033533, 10.1109/access.2020.3033533]
[30]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778