Feature Aggregation in Joint Sound Classification and Localization Neural Networks

被引：1

作者：

Healy, Brendan ^{[1
]}

Mcnamee, Patrick ^{[1
]}

Ahmadabadi, Zahra Nili ^{[1
]}

机构：

[1] San Diego State Univ, Dept Mech Engn, San Diego, CA 92182 USA

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Joint sound signal classification and localization; multi-task deep learning; feature aggregation; EVENT LOCALIZATION; ARCHITECTURE;

D O I：

10.1109/ACCESS.2024.3438947

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Current state-of-the-art sound source localization (SSL) deep learning networks lack feature aggregation within their architecture. Feature aggregation within neural network architectures enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. We adapt feature aggregation sub-architectures from computer vision neural networks onto a baseline neural network architecture for SSL, the Sound Event Localization and Detection network (SELDnet). The incorporated sub-architecture are: Path Aggregation Network (PANet); Weighted Bi-directional Feature Pyramid Network (BiFPN); and a novel Scale Encoding Network (SEN). These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. The results show that models incorporating feature aggregations outperformed the baseline SELDnet, in both sound signal classification and localization. Among the feature aggregators, PANet exhibited superior performance compared to other methods, which were otherwise comparable. The results provide evidence that feature aggregation sub-architectures enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.

引用

页码：109157 / 109170

页数：14

共 103 条

[61]

Odena A, 2016, Distill, V1, pe3

[62]

Pamungkas Y., 2022, P 2 INT SEM MACH LEA, P269

[63] Multitask Learning of Time-Frequency CNN for Sound Source Localization [J].

Pang, Cheng ;

Liu, Hong ;

Li, Xiaofei .

IEEE ACCESS, 2019, 7 :40725-40737

[64]

Patel S., 2020, DCASE2020 Challenge

[65] Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019 [J].

Politis, Archontis ;

Mesaros, Annamaria ;

Adavanne, Sharath ;

Heittola, Toni ;

Virtanen, Tuomas .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :684-698

[66]

Powers DMW, 2011, J MACH LEARN TECHNOL, V2, P37, DOI [DOI 10.48550/ARXIV.2010.16061, 10.48550/arXiv.2010.16061, DOI 10.9735/2229-3981]

[67]

Ranjan R., 2019, P DET CLASS AC SCEN

[68] Localization of sound sources in robotics: A review [J].

Rascon, Caleb ;

Meza, Ivan .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2017, 96 :184-210

[69]

Rasheed S., 2013, Appl. Acoust., V74, P635

[70]

Redmon J., 2018, YOLOV3 INCREMENTAL I, P1

← 2 3 4 5 6 7 8 9 10 11 →