Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

被引:152
作者
Tan, Ke [1 ]
Chen, Jitong [1 ,2 ]
Wang, DeLiang [1 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Baidu Res, Silicon Valley AI Lab, Sunnyvale, CA 94089 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Dilated convolutions; residual learning; gated linear units; sequence-to-sequence mapping; speech enhancement;
D O I
10.1109/TASLP.2018.2876171
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model, and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.
引用
收藏
页码:189 / 198
页数:10
相关论文
共 50 条
[41]   Monaural speech enhancement through deep wave-U-net [J].
Guimaraes, Heitor R. ;
Nagano, Hitoshi ;
Silva, Diego W. .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 158
[42]   A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement [J].
Du, Zhihao ;
Zhang, Xueliang ;
Han, Jiqing .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :1493-1505
[43]   Embedding Encoder-Decoder With Attention Mechanism for Monaural Speech Enhancement [J].
Lan, Tian ;
Ye, Wenzheng ;
Lyu, Yilan ;
Zhang, Junyi ;
Liu, Qiao .
IEEE ACCESS, 2020, 8 :96677-96685
[44]   A Nested U-Net With Self-Attention and Dense Connectivity for Monaural Speech Enhancement [J].
Xiang, Xiaoxiao ;
Zhang, Xiaojuan ;
Chen, Haozhe .
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 :105-109
[45]   DeepResGRU: Residual gated recurrent neural network-augmented Kalman filtering for speech enhancement and recognition [J].
Saleem, Nasir ;
Gao, Jiechao ;
Khattak, Muhammad Irfan ;
Rauf, Hafiz Tayyab ;
Kadry, Seifedine ;
Shafi, Muhammad .
KNOWLEDGE-BASED SYSTEMS, 2022, 238
[46]   A Convolutional Gated Recurrent Network for Speech Enhancement [J].
Yuan W.-H. ;
Hu S.-D. ;
Shi Y.-L. ;
Li Z. ;
Liang C.-Y. .
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2020, 48 (07) :1276-1283
[47]   Improved Relativistic Cycle-Consistent GAN With Dilated Residual Network and Multi-Attention for Speech Enhancement [J].
Wang, Yutian ;
Yu, Guochen ;
Wang, Jingling ;
Wang, Hui ;
Zhang, Qin .
IEEE ACCESS, 2020, 8 :183272-183285
[48]   CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement [J].
Abdulatif, Sherif ;
Cao, Ruizhe ;
Yang, Bin .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :2477-2493
[49]   An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement [J].
Xu, Zezheng ;
Jiang, Ting ;
Li, Chao ;
Yu, Jiacheng .
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[50]   Estimation of Ideal Binary Mask for Audio-Visual Monaural Speech Enhancement [J].
S. Balasubramanian ;
R. Rajavel ;
Asutosh Kar .
Circuits, Systems, and Signal Processing, 2023, 42 :5313-5337