Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

被引：137

作者：

Tan, Ke ^{[1
]}

Chen, Jitong ^{[1
,2
]}

Wang, DeLiang ^{[1
,3
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Baidu Res, Silicon Valley AI Lab, Sunnyvale, CA 94089 USA

[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2019年 / 27卷 / 01期

关键词：

Dilated convolutions; residual learning; gated linear units; sequence-to-sequence mapping; speech enhancement;

D O I：

10.1109/TASLP.2018.2876171

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model, and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.

引用

页码：189 / 198

页数：10

共 50 条

[1] Monaural speech enhancement with dilated convolutions
Pirhosseinloo, Shadi
Brumberg, Jonathan S.
INTERSPEECH 2019, 2019, : 3143 - 3147
[2] GATED RESIDUAL NETWORKS WITH DILATED CONVOLUTIONS FOR SUPERVISED SPEECH SEPARATION
Tan, Ke
Chen, Jitong
Wang, DeLiang
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 21 - 25
[3] Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement
Tan, Ke
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 380 - 390
[4] Monaural Speech Enhancement Based on Attention-Gate Dilated Convolution Network
Zhang Tianqi
Bai Haojun
Ye Shaopeng
Liu Jianxing
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (09) : 3277 - 3288
[5] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
Sunny Dayal Vanambathina
Vaishnavi Anumola
Ponnapalli Tejasree
R. Divya
B. Manaswini
Multimedia Tools and Applications, 2023, 82 : 45717 - 45732
[6] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
Vanambathina, Sunny Dayal
Anumola, Vaishnavi
Tejasree, Ponnapalli
Divya, R.
Manaswini, B.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (29) : 45717 - 45732
[7] TWO-STAGE SPEECH ENHANCEMENT USING GATED CONVOLUTIONS
Thieling, Lars
Jax, Peter
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[8] PERCEPTUAL IMPROVEMENT OF DEEP NEURAL NETWORKS FOR MONAURAL SPEECH ENHANCEMENT
Han, Wei
Zhang, Xiongwei
Sun, Meng
Shi, Wenhua
Chen, Xushan
Hu, Yonggang
2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
[9] Joint Ideal Ratio Mask and Generative Adversarial Networks for Monaural Speech Enhancement
Yuan, Jing
Bao, Changchun
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 276 - 280
[10] Loss Functions for Deep Monaural Speech Enhancement
Freiwald, Jan
Schoenherr, Lea
Schymura, Christopher
Zeiler, Steffen
Kolossa, Dorothea
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →