SPEECH DENOISING IN THE WAVEFORM DOMAIN WITH SELF-ATTENTION

被引：53

作者：

Kong, Zhifeng ^{[1
,2
]}

Ping, Wei ^{[2
]}

Dantrey, Ambrish ^{[2
]}

Catanzaro, Bryan ^{[2
]}

机构：

[1] UCSD, La Jolla, CA 92093 USA

[2] NVIDIA, Santa Clara, CA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Speech denoising; speech enhancement; raw waveform; U-Net; self-attention;

D O I：

10.1109/ICASSP43922.2022.9746169

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed method outperforms the state-of-the-art models in terms of denoised speech quality from various objective and subjective evaluation metrics. (1)

引用

页码：7867 / 7871

页数：5

共 41 条

[1]

[Anonymous], 1979, IEEE T ACOUSTICS SPE

[2]

[Anonymous], SEGAN SPEECH ENHANCE

[3]

[Anonymous], 2007, Speech Enhancement: Theory and Practice

[4]

Baby Deepak, 2019, ICASSP

[5] Real Time Speech Enhancement in the Waveform Domain [J].

Defossez, Alexandre ;

Synnaeve, Gabriel ;

Adi, Yossi .

INTERSPEECH 2020, 2020, :3291-3295

[6]

Fu S.-W., 2021, Metricgan+: An improved version of metricgan for speech enhancement

[7]

Fu SW, 2019, PR MACH LEARN RES, V97

[8]

Germain Francois G, 2018, SPEECH DENOISING WIT

[9] UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition [J].

Hao, Xiang ;

Su, Xiangdong ;

Wang, Zhiyu ;

Zhang, Hui ;

Batushiren .

INTERSPEECH 2019, 2019, :1786-1790

[10]

Hao Xiang, 2021, ICASSP

← 1 2 3 4 5 →