Residual Unet with Attention Mechanism for Time-Frequency Domain Speech Enhancement

被引:0
作者
Chen, Hanyu [1 ]
Peng, Xiwei [1 ]
Jiang, Qiqi [1 ]
Guo, Yujie [1 ]
机构
[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China
来源
2022 41ST CHINESE CONTROL CONFERENCE (CCC) | 2022年
关键词
Speech enhancement; Unet; residual unit; attention gating;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Eliminating the negative effects of background environmental noise is an interesting and challenging task in audio processing. In recent years, denoising technology based on neural networks (NN) has achieved good performance. In particular, the structure based on the convolutional encoder and decoder has been proven to achieve good enhancement effects. On this basis, this paper proposes a residual unet structure combined with the attention mechanism. Effectively reduce the impact of gradient disappearance on network training, and improve the semantic gap between encoder output and decoder output due to unet shortcut connections. The experimental results show that compared with the DNN baseline and unet network, the enhanced voice quality has been significantly improved.
引用
收藏
页码:7007 / 7011
页数:5
相关论文
共 19 条
  • [1] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
  • [2] Gao T, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5054, DOI 10.1109/ICASSP.2018.8461861
  • [3] Garofolo J., 1993, NIST speech disc 1-1.1
  • [4] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [5] MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation
    Ibtehaz, Nabil
    Rahman, M. Sohel
    [J]. NEURAL NETWORKS, 2020, 121 : 74 - 87
  • [6] Kinoshita K, 2020, INT CONF ACOUST SPEE, P7009, DOI [10.1109/ICASSP40776.2020.9053266, 10.1109/icassp40776.2020.9053266]
  • [7] Loizou P.C., 2013, SPEECH ENHANCEMENT T
  • [8] Oktay O., 2018, ARXIV
  • [9] Improving GANs for Speech Enhancement
    Phan, Huy
    McLoughlin, Ian V.
    Pham, Lam
    Chen, Oliver Y.
    Koch, Philipp
    De Vos, Maarten
    Mertins, Alfred
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 1700 - 1704
  • [10] Piczak Karol J., 2015, P 23 ANN ACM C MULT