REDUNDANT CONVOLUTIONAL NETWORK WITH ATTENTION MECHANISM FOR MONAURAL SPEECH ENHANCEMENT

被引:0
作者
Lan, Tian [1 ,2 ]
Lyu, Yilan [1 ]
Hui, Guoqiang [1 ]
Mokhosi, Refuoe [1 ]
Li, Sen [1 ]
Liu, Qiao [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu, Sichuan, Peoples R China
[2] CETC Big Data Res Inst Co Ltd, Guiyang, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
美国国家科学基金会;
关键词
Squeeze-and-Excitation; convolutional encoder-decoder; speech enhancement; fully convolutional network; attention mechanism; NEURAL-NETWORK;
D O I
10.1109/icassp40776.2020.9053277
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The redundant convolutional encoder-decoder network has proven useful in speech enhancement tasks. It can capture localized time-frequency details of speech signals through both the fully convolutional network structure and feature selection capability resulting from the encoder-decoder mechanism. However, it does not explicitly consider the signal filtering mechanism, which we regard as important for speech enhancement models. In this study, we introduce an attention mechanism into the convolutional encoder-decoder model. This mechanism adaptively filters channel-wise feature responses by explicitly modeling attentions (on speech versus noise signals) between channels. Experimental results show that the proposed attention model is effective in capturing speech signals from background noise, and performs especially better in unseen noise conditions compared to other state-of-the-art models.
引用
收藏
页码:6654 / 6658
页数:5
相关论文
共 25 条
[1]  
Bhat GS, 2019, IEEE ACCESS, V7, P78421, DOI [10.1109/ACCESS.2019.2922370, 10.1109/access.2019.2922370]
[2]  
Ernst O, 2018, EUR SIGNAL PR CONF, P390, DOI 10.23919/EUSIPCO.2018.8553141
[3]  
Fu SW, 2017, ASIAPAC SIGN INFO PR, P6, DOI 10.1109/APSIPA.2017.8281993
[4]  
Garofolo J., 1993, 4930 NISTIR, V93, P1
[5]  
Grzywalski T, 2018, SIG P ALGO ARCH ARR, P82, DOI 10.23919/SPA.2018.8563364
[6]   MMSE BASED NOISE PSD TRACKING WITH LOW COMPLEXITY [J].
Hendriks, Richard C. ;
Heusdens, Richard ;
Jensen, Jesper .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4266-4269
[7]  
Hu G., 2004, 100 nonspeech environmental sounds
[8]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[9]   Wavelet domain image denoising by thresholding and Wiener filtering [J].
Kazubek, M .
IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (11) :324-326
[10]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001