Channel and temporal-frequency attention UNet for monaural speech enhancement

被引:0
|
作者
Shiyun Xu
Zehua Zhang
Mingjiang Wang
机构
[1] Harbin Institute of Technology,Key Laboratory for Key Technologies of IoT Terminals
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2023卷
关键词
Speech enhancement; Neural network; Denoising; Dereverberation;
D O I
暂无
中图分类号
学科分类号
摘要
The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancement tasks aimed at improving speech quality. In this study, we propose a monaural speech enhancement model called the channel and temporal-frequency attention UNet (CTFUNet). CTFUNet takes the noisy spectrum as input and produces a complex ideal ratio mask (cIRM) as output. To improve the speech enhancement performance of CTFUNet, we employ multi-scale temporal-frequency processing to extract input speech spectrum features. We also utilize multi-conv head channel attention and residual channel attention to capture temporal-frequency and channel features. Moreover, we introduce the channel temporal-frequency skip connection to alleviate information loss between down-sampling and up-sampling. On the blind test set of the first deep noise suppression challenge, our proposed CTFUNet has better denoising performance than the champion models and the latest models. Furthermore, our model outperforms recent models such as Uformar and MTFAA in both denoising and dereverberation performance.
引用
收藏
相关论文
共 50 条
  • [31] Adversarial Dictionary Learning for Monaural Speech Enhancement
    Ji, Yunyun
    Xu, Longting
    Zhu, Wei-Ping
    INTERSPEECH 2020, 2020, : 4034 - 4038
  • [32] GAN-in-GAN for Monaural Speech Enhancement
    Duan, Yicun
    Ren, Jianfeng
    Yu, Heng
    Jiang, Xudong
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 853 - 857
  • [33] Single channel speech enhancement using temporal masking
    Gunawan, TS
    Ambikairajah, E
    2004 9TH IEEE SINGAPORE INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS (ICCS), 2004, : 250 - 254
  • [34] DDP-Unet: A mapping neural network for single-channel speech enhancement
    Chen, Haoxiang
    Xu, Yanyan
    Ke, Dengfeng
    Su, Kaile
    COMPUTER SPEECH AND LANGUAGE, 2025, 93
  • [35] A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement
    Yechuri, Sivaramakrishna
    Komati, Thirupathi Rao
    Yellapragada, Rama Krishna
    Vanambathina, Sunnydaya
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (09) : 5682 - 5710
  • [36] Single channel speech enhancement using time-frequency attention mechanism based nested U-net model
    Prathipati, Anil Kumar
    Chakravarthy, A. S. N.
    ENGINEERING RESEARCH EXPRESS, 2024, 6 (03):
  • [37] A COMPUTATIONALLY-EFFICIENT SINGLE-CHANNEL SPEECH ENHANCEMENT ALGORITHM FOR MONAURAL HEARING AIDS
    Ayllon, David
    Gil-Pita, Roberto
    Utrilla-Manso, Manuel
    Rosa-Zurera, Manuel
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2050 - 2054
  • [38] On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Soren Holdt
    Jensen, Jesper
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 825 - 838
  • [39] Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
    Wang, Xianyun
    Bao, Changchun
    INTERSPEECH 2019, 2019, : 3188 - 3192
  • [40] A Time-Frequency Attention Module for Neural Speech Enhancement
    Zhang, Qiquan
    Qian, Xinyuan
    Ni, Zhaoheng
    Nicolson, Aaron
    Ambikairajah, Eliathamby
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 462 - 475