Channel and temporal-frequency attention UNet for monaural speech enhancement

被引:0
|
作者
Shiyun Xu
Zehua Zhang
Mingjiang Wang
机构
[1] Harbin Institute of Technology,Key Laboratory for Key Technologies of IoT Terminals
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2023卷
关键词
Speech enhancement; Neural network; Denoising; Dereverberation;
D O I
暂无
中图分类号
学科分类号
摘要
The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancement tasks aimed at improving speech quality. In this study, we propose a monaural speech enhancement model called the channel and temporal-frequency attention UNet (CTFUNet). CTFUNet takes the noisy spectrum as input and produces a complex ideal ratio mask (cIRM) as output. To improve the speech enhancement performance of CTFUNet, we employ multi-scale temporal-frequency processing to extract input speech spectrum features. We also utilize multi-conv head channel attention and residual channel attention to capture temporal-frequency and channel features. Moreover, we introduce the channel temporal-frequency skip connection to alleviate information loss between down-sampling and up-sampling. On the blind test set of the first deep noise suppression challenge, our proposed CTFUNet has better denoising performance than the champion models and the latest models. Furthermore, our model outperforms recent models such as Uformar and MTFAA in both denoising and dereverberation performance.
引用
收藏
相关论文
共 50 条
  • [1] Channel and temporal-frequency attention UNet for monaural speech enhancement
    Xu, Shiyun
    Zhang, Zehua
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [2] Two-stage UNet with channel and temporal-frequency attention for multi-channel speech enhancement
    Xu, Shiyun
    Cao, Yinghan
    Zhang, Zehua
    Wang, Mingjiang
    SPEECH COMMUNICATION, 2025, 166
  • [3] TIME-FREQUENCY ATTENTION FOR MONAURAL SPEECH ENHANCEMENT
    Zhang, Qiquan
    Song, Qi
    Ni, Zhaoheng
    Nicolson, Aaron
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7852 - 7856
  • [4] REDUNDANT CONVOLUTIONAL NETWORK WITH ATTENTION MECHANISM FOR MONAURAL SPEECH ENHANCEMENT
    Lan, Tian
    Lyu, Yilan
    Hui, Guoqiang
    Mokhosi, Refuoe
    Li, Sen
    Liu, Qiao
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6654 - 6658
  • [5] Residual Unet with Attention Mechanism for Time-Frequency Domain Speech Enhancement
    Chen, Hanyu
    Peng, Xiwei
    Jiang, Qiqi
    Guo, Yujie
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 7007 - 7011
  • [6] Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
    Zhang, Qiquan
    Song, Qi
    Nicolson, Aaron
    Lan, Tian
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 166 - 170
  • [7] A Recursive Network with Dynamic Attention for Monaural Speech Enhancement
    Li, Andong
    Zheng, Chengshi
    Fan, Cunhang
    Peng, Renhua
    Li, Xiaodong
    INTERSPEECH 2020, 2020, : 2422 - 2426
  • [8] Embedding Encoder-Decoder With Attention Mechanism for Monaural Speech Enhancement
    Lan, Tian
    Ye, Wenzheng
    Lyu, Yilan
    Zhang, Junyi
    Liu, Qiao
    IEEE ACCESS, 2020, 8 : 96677 - 96685
  • [9] Monaural Speech Dereverberation Using Temporal Convolutional Networks With Self Attention
    Zhao, Yan
    Wang, DeLiang
    Xu, Buye
    Zhang, Tao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1598 - 1607
  • [10] MONAURAL SPEECH ENHANCEMENT WITH COMPLEX CONVOLUTIONAL BLOCK ATTENTION MODULE AND JOINT TIME FREQUENCY LOSSES
    Zhao, Shengkui
    Nguyen, Trung Hieu
    Ma, Bin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6648 - 6652