Feature Aggregation in Joint Sound Classification and Localization Neural Networks

被引:1
作者
Healy, Brendan [1 ]
Mcnamee, Patrick [1 ]
Ahmadabadi, Zahra Nili [1 ]
机构
[1] San Diego State Univ, Dept Mech Engn, San Diego, CA 92182 USA
关键词
Joint sound signal classification and localization; multi-task deep learning; feature aggregation; EVENT LOCALIZATION; ARCHITECTURE;
D O I
10.1109/ACCESS.2024.3438947
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current state-of-the-art sound source localization (SSL) deep learning networks lack feature aggregation within their architecture. Feature aggregation within neural network architectures enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. We adapt feature aggregation sub-architectures from computer vision neural networks onto a baseline neural network architecture for SSL, the Sound Event Localization and Detection network (SELDnet). The incorporated sub-architecture are: Path Aggregation Network (PANet); Weighted Bi-directional Feature Pyramid Network (BiFPN); and a novel Scale Encoding Network (SEN). These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. The results show that models incorporating feature aggregations outperformed the baseline SELDnet, in both sound signal classification and localization. Among the feature aggregators, PANet exhibited superior performance compared to other methods, which were otherwise comparable. The results provide evidence that feature aggregation sub-architectures enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.
引用
收藏
页码:109157 / 109170
页数:14
相关论文
共 103 条
[41]   Voice Activity Detection Using an Adaptive Context Attention Model [J].
Kim, Juntae ;
Hahn, Minsoo .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (08) :1181-1185
[42]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[43]   SOUND SOURCE DISTANCE ESTIMATION IN DIVERSE AND DYNAMIC ACOUSTIC CONDITIONS [J].
Kushwaha, Saksham Singh ;
Roman, Iran R. ;
Fuentes, Magdalena ;
Bello, Juan Pablo .
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[44]   Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging [J].
Lee, Jongpil ;
Nam, Juhan .
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) :1208-1212
[45]   Zero-Shot Single-Microphone Sound Classification and Localization in a Building Via the Synthesis of Unseen Features [J].
Lee, Seungjun ;
Yang, Haesang ;
Choi, Hwiyong ;
Seong, Woojae .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :2339-2351
[46]  
Li QL, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P2616, DOI 10.1109/ICASSP.2018.8461386
[47]  
Li XF, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P2819, DOI 10.1109/IROS.2016.7759437
[48]   Feature Pyramid Networks for Object Detection [J].
Lin, Tsung-Yi ;
Dollar, Piotr ;
Girshick, Ross ;
He, Kaiming ;
Hariharan, Bharath ;
Belongie, Serge .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :936-944
[49]   Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation [J].
Liu, Chenxi ;
Chen, Liang-Chieh ;
Schroff, Florian ;
Adam, Hartwig ;
Hua, Wei ;
Yuille, Alan ;
Li Fei-Fei .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :82-92
[50]  
Liu H., 2019, P IEEE INT C ROB BIO, P352