Residual Unet with Attention Mechanism for Time-Frequency Domain Speech Enhancement

被引:0
|
作者
Chen, Hanyu [1 ]
Peng, Xiwei [1 ]
Jiang, Qiqi [1 ]
Guo, Yujie [1 ]
机构
[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China
关键词
Speech enhancement; Unet; residual unit; attention gating;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Eliminating the negative effects of background environmental noise is an interesting and challenging task in audio processing. In recent years, denoising technology based on neural networks (NN) has achieved good performance. In particular, the structure based on the convolutional encoder and decoder has been proven to achieve good enhancement effects. On this basis, this paper proposes a residual unet structure combined with the attention mechanism. Effectively reduce the impact of gradient disappearance on network training, and improve the semantic gap between encoder output and decoder output due to unet shortcut connections. The experimental results show that compared with the DNN baseline and unet network, the enhanced voice quality has been significantly improved.
引用
收藏
页码:7007 / 7011
页数:5
相关论文
共 50 条
  • [1] TIME-FREQUENCY ATTENTION FOR MONAURAL SPEECH ENHANCEMENT
    Zhang, Qiquan
    Song, Qi
    Ni, Zhaoheng
    Nicolson, Aaron
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7852 - 7856
  • [2] Neural speech enhancement in the time-frequency domain
    Volkmer, M
    2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 617 - 626
  • [3] A Time-Frequency Attention Module for Neural Speech Enhancement
    Zhang, Qiquan
    Qian, Xinyuan
    Ni, Zhaoheng
    Nicolson, Aaron
    Ambikairajah, Eliathamby
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 462 - 475
  • [4] Integrated speech enhancement and coding in the time-frequency domain
    Drygajlo, A
    Carnero, B
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1183 - 1186
  • [5] Joint Time-Frequency and Time Domain Learning for Speech Enhancement
    Tang, Chuanxin
    Luo, Chong
    Zhao, Zhiyuan
    Xie, Wenxuan
    Zeng, Wenjun
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3816 - 3822
  • [6] Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis
    Zhang, Wenbo
    Xie, Xuefeng
    Du, Yanling
    Huang, Dongmei
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 155 (06): : 3580 - 3588
  • [7] Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain
    Oostermeijer, Koen
    Wang, Qing
    Du, Jun
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 465 - 470
  • [8] Noisy-reverberant Speech Enhancement Using DenseUNet with Time-frequency Attention
    Zhao, Yan
    Wang, DeLiang
    INTERSPEECH 2020, 2020, : 3261 - 3265
  • [9] CASCADED TIME plus TIME-FREQUENCY UNET FOR SPEECH ENHANCEMENT: JOINTLY ADDRESSING CLIPPING, CODEC DISTORTIONS, AND GAPS
    Nair, Arun Asokan
    Koishida, Kazuhito
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7153 - 7157
  • [10] Channel and temporal-frequency attention UNet for monaural speech enhancement
    Shiyun Xu
    Zehua Zhang
    Mingjiang Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023