Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

被引:137
|
作者
Tan, Ke [1 ]
Chen, Jitong [1 ,2 ]
Wang, DeLiang [1 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Baidu Res, Silicon Valley AI Lab, Sunnyvale, CA 94089 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Dilated convolutions; residual learning; gated linear units; sequence-to-sequence mapping; speech enhancement;
D O I
10.1109/TASLP.2018.2876171
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model, and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.
引用
收藏
页码:189 / 198
页数:10
相关论文
共 50 条
  • [1] Monaural speech enhancement with dilated convolutions
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    INTERSPEECH 2019, 2019, : 3143 - 3147
  • [2] GATED RESIDUAL NETWORKS WITH DILATED CONVOLUTIONS FOR SUPERVISED SPEECH SEPARATION
    Tan, Ke
    Chen, Jitong
    Wang, DeLiang
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 21 - 25
  • [3] Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 380 - 390
  • [4] Monaural Speech Enhancement Based on Attention-Gate Dilated Convolution Network
    Zhang Tianqi
    Bai Haojun
    Ye Shaopeng
    Liu Jianxing
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (09) : 3277 - 3288
  • [5] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
    Sunny Dayal Vanambathina
    Vaishnavi Anumola
    Ponnapalli Tejasree
    R. Divya
    B. Manaswini
    Multimedia Tools and Applications, 2023, 82 : 45717 - 45732
  • [6] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
    Vanambathina, Sunny Dayal
    Anumola, Vaishnavi
    Tejasree, Ponnapalli
    Divya, R.
    Manaswini, B.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (29) : 45717 - 45732
  • [7] TWO-STAGE SPEECH ENHANCEMENT USING GATED CONVOLUTIONS
    Thieling, Lars
    Jax, Peter
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [8] PERCEPTUAL IMPROVEMENT OF DEEP NEURAL NETWORKS FOR MONAURAL SPEECH ENHANCEMENT
    Han, Wei
    Zhang, Xiongwei
    Sun, Meng
    Shi, Wenhua
    Chen, Xushan
    Hu, Yonggang
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [9] Joint Ideal Ratio Mask and Generative Adversarial Networks for Monaural Speech Enhancement
    Yuan, Jing
    Bao, Changchun
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 276 - 280
  • [10] Loss Functions for Deep Monaural Speech Enhancement
    Freiwald, Jan
    Schoenherr, Lea
    Schymura, Christopher
    Zeiler, Steffen
    Kolossa, Dorothea
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,