Robust DOA Estimation Using Multi-Scale Fusion Network with Attention Mask

被引:1
作者
Yan, Yuting [1 ]
Huang, Qinghua [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 11期
关键词
complex-valued neural network; direction-of-arrival; reverberant; multi-scale; attention; SPHERICAL MICROPHONE ARRAY; OF-ARRIVAL ESTIMATION; NEURAL-NETWORK; ACOUSTIC ANALYSIS; DIRECTION; LOCALIZATION; ALGORITHM; FRAMEWORK;
D O I
10.3390/app14114488
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To overcome the limitations of traditional methods in reverberant and noisy environments, a robust multi-scale fusion neural network with attention mask is designed to improve direction-of-arrival (DOA) estimation accuracy for acoustic sources. It combines the benefits of deep learning and complex-valued operations to effectively deal with the interference of reverberation and noise in speech signals. The unique properties of complex-valued signals are exploited to fully capture inherent features and rich information is preserved in the complex field. An attention mask module is designed to generate distinct masks for selectively focusing and masking based on the input. After that, the multi-scale fusion block efficiently captures multi-scale spatial features by stacking complex-valued convolutional layers with small size kernels, and reduces the module complexity through special branching operations. Experimental results demonstrate that the model achieves significant improvements over other methods for speaker localization in reverberant and noisy environments. It provides a new solution for DOA estimation for acoustic sources in different scenarios, which has significant theoretical and practical implications.
引用
收藏
页数:15
相关论文
共 60 条
[1]   Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information [J].
Ben Zaken, Orel ;
Kumar, Anurag ;
Tourbabin, Vladimir ;
Rafaely, Boaz .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :1298-1309
[2]   A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement [J].
Borgstrom, Bengt J. ;
Brandstein, Michael S. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :2418-2431
[3]   Two-Stage Deep Convolutional Neural Networks for DOA Estimation in Impulsive Noise [J].
Cai, Ruiyan ;
Tian, Quan .
IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2024, 72 (02) :2047-2051
[4]   Research on Recognition of Fly Species Based on Improved RetinaNet and CBAM [J].
Chen, Yantong ;
Zhang, Xianzhong ;
Chen, Weinan ;
Li, Yuyang ;
Wang, Junsheng .
IEEE ACCESS, 2020, 8 (08) :102907-102919
[5]  
Costanzo S., 2023, P 2023 IEEE C ANT ME, P728, DOI [10.1109/CAMA57522.2023.10352766, DOI 10.1109/CAMA57522.2023.10352766]
[6]   CVNN Approach for Microwave Imaging Applications in Brain Cancer: Preliminary Results [J].
Costanzo, Sandra ;
Flores, Alexandra .
2024 18TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION, EUCAP, 2024,
[7]  
Deb S., 2017, P 2017 14 IEEE IND C, P1, DOI [10.1109/INDICON.2017.8487854, DOI 10.1109/INDICON.2017.8487854]
[8]   A training algorithm with selectable search direction for complex-valued feedforward neural networks [J].
Dong, Zhongying ;
Huang, He .
NEURAL NETWORKS, 2021, 137 :75-84
[9]   Octant Spherical Harmonics Features for Source Localization Using Artificial Intelligence Based on Unified Learning Framework [J].
Dwivedi P. ;
Routray G. ;
Hegde R.M. .
IEEE Transactions on Artificial Intelligence, 2024, 5 (08) :3845-3857
[10]   Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors [J].
Fras, Mieszko ;
Kowalczyk, Konrad .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :1954-1967