Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

被引：152

作者：

Tan, Ke ^{[1
]}

Chen, Jitong ^{[1
,2
]}

Wang, DeLiang ^{[1
,3
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Baidu Res, Silicon Valley AI Lab, Sunnyvale, CA 94089 USA

[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2019年 / 27卷 / 01期

关键词：

Dilated convolutions; residual learning; gated linear units; sequence-to-sequence mapping; speech enhancement;

D O I：

10.1109/TASLP.2018.2876171

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model, and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.

引用

页码：189 / 198

页数：10

共 50 条

[31] Phoneme-dependent NMF for speech enhancement in monaural mixtures [J].

Raj, Bhiksha ;

Singh, Rita ;

Virtanen, Tuomas .

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :1224-+

[32] Monaural Speech Enhancement using Deep Neural Network with Cross-Speech Dataset [J].

Jamal, Norezmi ;

Fuad, Norfaiza ;

Shanta, Shahnoor ;

Sha'abani, Mohd Nurul Al-Hafiz .

2021 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING APPLICATIONS, IEEE ICSIPA 2021, 2021, :44-49

[33] MONAURAL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS BY MAXIMIZING A SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE [J].

Kolbaek, Morten ;

Tan, Zheng-Hua ;

Jensen, Jesper .

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :5059-5063

[34] Improved Monaural Speech Enhancement via Low-Complexity Fully Connected Neural Networks: A Performance Analysis [J].

Kar, Asutosh ;

Sivapatham, Shoba ;

Reddy, Himavanth .

CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025, 44 (05) :3258-3287

[35] Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement [J].

Fazal-E-Wahab ;

Ye, Zhongfu ;

Saleem, Nasir ;

Ali, Hamza ;

Ali, Imad .

INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2024, 9 (01) :66-74

[36] Channel and temporal-frequency attention UNet for monaural speech enhancement [J].

Shiyun Xu ;

Zehua Zhang ;

Mingjiang Wang .

EURASIP Journal on Audio, Speech, and Music Processing, 2023

[37] MONAURAL SPEECH ENHANCEMENT ON DRONE VIA ADAPTER BASED TRANSFER LEARNING [J].

Chen, Xingyu ;

Bi, Hanwen ;

Lai, Wei-Ting ;

Ma, Fei .

2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, :85-89

[38] Channel and temporal-frequency attention UNet for monaural speech enhancement [J].

Xu, Shiyun ;

Zhang, Zehua ;

Wang, Mingjiang .

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)

[39] On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement [J].

Kolbaek, Morten ;

Tan, Zheng-Hua ;

Jensen, Soren Holdt ;

Jensen, Jesper .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :825-838

[40] A Composite Predictive-Generative Approach to Monaural Universal Speech Enhancement [J].

Zhang, Jie ;

Yan, Haoyin ;

Li, Xiaofei .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2025, 33 :2312-2325

← 1 2 3 4 5 →