Residual Unet with Attention Mechanism for Time-Frequency Domain Speech Enhancement

被引:0
|
作者
Chen, Hanyu [1 ]
Peng, Xiwei [1 ]
Jiang, Qiqi [1 ]
Guo, Yujie [1 ]
机构
[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China
关键词
Speech enhancement; Unet; residual unit; attention gating;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Eliminating the negative effects of background environmental noise is an interesting and challenging task in audio processing. In recent years, denoising technology based on neural networks (NN) has achieved good performance. In particular, the structure based on the convolutional encoder and decoder has been proven to achieve good enhancement effects. On this basis, this paper proposes a residual unet structure combined with the attention mechanism. Effectively reduce the impact of gradient disappearance on network training, and improve the semantic gap between encoder output and decoder output due to unet shortcut connections. The experimental results show that compared with the DNN baseline and unet network, the enhanced voice quality has been significantly improved.
引用
收藏
页码:7007 / 7011
页数:5
相关论文
共 50 条
  • [31] Time domain speech enhancement with CNN and time-attention transformer
    Saleem, Nasir
    Gunawan, Teddy Surya
    Dhahbi, Sami
    Bourouis, Sami
    DIGITAL SIGNAL PROCESSING, 2024, 147
  • [32] Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising
    Williamson, Donald S.
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1492 - 1501
  • [33] Modeling speech signals in the time-frequency domain using GARCH
    Cohen, I
    SIGNAL PROCESSING, 2004, 84 (12) : 2453 - 2459
  • [34] An approach to digital watermarking of speech signals in the time-frequency domain
    Stankovic, Srdjan
    Orovic, Irena
    Zaric, Nikola
    Ioana, Cornel
    PROCEEDINGS ELMAR-2006, 2006, : 127 - 130
  • [35] HYBRID TIME-FREQUENCY DOMAIN ARTICULATORY SPEECH SYNTHESIZER.
    Sondhi, Man Mohan
    Schroeter, Juergen
    IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, ASSP-35 (07): : 955 - 967
  • [36] Time-frequency network combining batch attention and spatial attention for speech bandwidth extension
    Xu, Chundong
    Tan, Guowu
    Ying, Dongwen
    APPLIED ACOUSTICS, 2023, 211
  • [37] A Time-Frequency Domain Formant Frequency Estimation Scheme for Noisy Speech Signals
    Fattah, S. A.
    Zhu, W-P.
    Ahmad, M. O.
    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 1201 - 1204
  • [38] Emotion Recognition Based on Data Enhancement in Time-Frequency Domain
    Li, Qianqian
    Ren, Fuji
    Shen, Xiaoyan
    Kang, Xin
    INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS 2020, 2020, 11574
  • [39] Rolling Bearing Fault Diagnosis Based on Time-Frequency Compression Fusion and Residual Time-Frequency Mixed Attention Network
    Sun, Guodong
    Yang, Xiong
    Xiong, Chenyun
    Hu, Ye
    Liu, Moyun
    APPLIED SCIENCES-BASEL, 2022, 12 (10):
  • [40] Stacked U-Net with Time-Frequency Attention and Deep Connection Net for Single Channel Speech Enhancement
    Parisae, Veeraswamy
    Bhavanam, S. Nagakishore
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,