SPEECH DENOISING IN THE WAVEFORM DOMAIN WITH SELF-ATTENTION

被引:53
作者
Kong, Zhifeng [1 ,2 ]
Ping, Wei [2 ]
Dantrey, Ambrish [2 ]
Catanzaro, Bryan [2 ]
机构
[1] UCSD, La Jolla, CA 92093 USA
[2] NVIDIA, Santa Clara, CA USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Speech denoising; speech enhancement; raw waveform; U-Net; self-attention;
D O I
10.1109/ICASSP43922.2022.9746169
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed method outperforms the state-of-the-art models in terms of denoised speech quality from various objective and subjective evaluation metrics. (1)
引用
收藏
页码:7867 / 7871
页数:5
相关论文
共 41 条
[1]  
[Anonymous], 1979, IEEE T ACOUSTICS SPE
[2]  
[Anonymous], SEGAN SPEECH ENHANCE
[3]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[4]  
Baby Deepak, 2019, ICASSP
[5]   Real Time Speech Enhancement in the Waveform Domain [J].
Defossez, Alexandre ;
Synnaeve, Gabriel ;
Adi, Yossi .
INTERSPEECH 2020, 2020, :3291-3295
[6]  
Fu S.-W., 2021, Metricgan+: An improved version of metricgan for speech enhancement
[7]  
Fu SW, 2019, PR MACH LEARN RES, V97
[8]  
Germain Francois G, 2018, SPEECH DENOISING WIT
[9]   UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition [J].
Hao, Xiang ;
Su, Xiangdong ;
Wang, Zhiyu ;
Zhang, Hui ;
Batushiren .
INTERSPEECH 2019, 2019, :1786-1790
[10]  
Hao Xiang, 2021, ICASSP