A Novel Noise-Aware Deep Learning Model for Underwater Acoustic Denoising

被引:26
作者
Zhou, Aolong [1 ,2 ]
Zhang, Wen [3 ]
Li, Xiaoyong [1 ,2 ]
Xu, Guojun [2 ]
Zhang, Bingbing [2 ]
Ma, Yanxin [2 ]
Song, Junqiang [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Comp Sci & Technol, Changsha 410000, Peoples R China
[2] Natl Univ Def Technol, Coll Meteorol & Oceanog, Changsha 410000, Peoples R China
[3] Acad Mil Sci, Strateg Evaluat & Consultat Ctr, Beijing 100000, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
基金
中国国家自然科学基金;
关键词
Noise reduction; Noise measurement; Time-frequency analysis; Signal to noise ratio; Marine vehicles; Training; Time-domain analysis; Fullband-subband attention (FSA) network; noise-aware; noise-aware deep learning model with fullband-subband attention network (NAFSA-Net); underwater acoustic denoising; SPEECH ENHANCEMENT;
D O I
10.1109/TGRS.2023.3254652
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Underwater acoustic signal denoising technology aims to overcome the challenge of recovering valuable ship target signals from noisy audios by suppressing underwater background noise. Traditional statistical-based denoising techniques are difficult to be applied effectively in complex underwater environments, especially in the case of extremely low signal-to-noise ratios (SNRs). To address these problems, we propose a noise-aware deep learning model with fullband-subband attention network (NAFSA-Net) for underwater acoustic signal denoising. NAFSA-Net adopts an encoder to extract the feature representation of the input audio. Subsequently, the noise subnet and the target subnet are designed to estimate the noise component and the target component simultaneously. Specifically, some stacked fullband-subband attention (FSA) blocks are deployed in each subnet to capture both global dependencies and fine-grained local dependencies of features. Furthermore, we introduce an interaction module to transmit auxiliary information from the noise subnet to the target subnet. Finally, we propose an improved weight scale-invariant signal-to-noise ratio (SI-SNR) loss function to optimize the training of our model. Experimental results show that our proposed NAFSA-Net substantially outperforms traditional methods and competitive DNN-based solutions in denoising underwater noisy signals with very low SNRs. More importantly, our proposals achieve equally excellent performance on both unseen datasets, which indicates that NAFSA-Net can be a more robust choice for real-world underwater acoustic denoising systems.
引用
收藏
页数:13
相关论文
共 51 条
[1]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[2]   New insights into the noise reduction Wiener filter [J].
Chen, Jingdong ;
Benesty, Jacob ;
Huang, Yiteng ;
Doclo, Simon .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1218-1234
[3]   FullSubNet plus : CHANNEL ATTENTION FULLSUBNET WITH COMPLEX SPECTROGRAMS FOR SPEECH ENHANCEMENT [J].
Chen, Jun ;
Wang, Zilin ;
Tuo, Deyi ;
Wu, Zhiyong ;
Kang, Shiyin ;
Meng, Helen .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7857-7861
[4]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[5]  
Choi H., 2018, 2018 IEEE 20 INT WOR, P1, DOI DOI 10.1109/MMSP.2018.8547134
[6]  
Choi M, 2020, AAAI CONF ARTIF INTE, V34, P10663
[7]   DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT [J].
Dang, Feng ;
Chen, Hangting ;
Zhangt, Pengyuan .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6857-6861
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]  
Fan CH, 2019, ASIAPAC SIGN INFO PR, P662, DOI [10.1109/apsipaasc47483.2019.9023216, 10.1109/APSIPAASC47483.2019.9023216]
[10]   DeepLofargram: A deep learning based fluctuating dim frequency line detection and recovery [J].
Han, Yina ;
Li, Yuyan ;
Liu, Qingyu ;
Ma, Yuanliang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (04) :2182-2194