Channel and temporal-frequency attention UNet for monaural speech enhancement

被引:0
|
作者
Shiyun Xu
Zehua Zhang
Mingjiang Wang
机构
[1] Harbin Institute of Technology,Key Laboratory for Key Technologies of IoT Terminals
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2023卷
关键词
Speech enhancement; Neural network; Denoising; Dereverberation;
D O I
暂无
中图分类号
学科分类号
摘要
The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancement tasks aimed at improving speech quality. In this study, we propose a monaural speech enhancement model called the channel and temporal-frequency attention UNet (CTFUNet). CTFUNet takes the noisy spectrum as input and produces a complex ideal ratio mask (cIRM) as output. To improve the speech enhancement performance of CTFUNet, we employ multi-scale temporal-frequency processing to extract input speech spectrum features. We also utilize multi-conv head channel attention and residual channel attention to capture temporal-frequency and channel features. Moreover, we introduce the channel temporal-frequency skip connection to alleviate information loss between down-sampling and up-sampling. On the blind test set of the first deep noise suppression challenge, our proposed CTFUNet has better denoising performance than the champion models and the latest models. Furthermore, our model outperforms recent models such as Uformar and MTFAA in both denoising and dereverberation performance.
引用
收藏
相关论文
共 50 条
  • [41] Auditory Mask Estimation by RPCA for Monaural Speech Enhancement
    Shi, Wenhua
    Zhang, Xiongwei
    Zou, Xia
    Han, Wei
    Min, Gang
    2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 179 - 184
  • [42] SpecMNet: Spectrum mend network for monaural speech enhancement
    Fan, Cunhang
    Zhang, Hongmei
    Yi, Jiangyan
    Lv, Zhao
    Tao, Jianhua
    Li, Taihao
    Pei, Guanxiong
    Wu, Xiaopei
    Li, Sheng
    APPLIED ACOUSTICS, 2022, 194
  • [43] A time-frequency fusion model for multi-channel speech enhancement
    Zeng, Xiao
    Xu, Shiyun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [44] FullSubNet plus : CHANNEL ATTENTION FULLSUBNET WITH COMPLEX SPECTROGRAMS FOR SPEECH ENHANCEMENT
    Chen, Jun
    Wang, Zilin
    Tuo, Deyi
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7857 - 7861
  • [45] Real-time single-channel speech enhancement based on causal attention mechanism
    Fan, Junyi
    Yang, Jibin
    Zhang, Xiongwei
    Yao, Yao
    APPLIED ACOUSTICS, 2022, 201
  • [46] Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods
    Zheng, Chengshi
    Zhang, Huiyong
    Liu, Wenzhe
    Luo, Xiaoxue
    Li, Andong
    Li, Xiaodong
    Moore, Brian C. J.
    TRENDS IN HEARING, 2023, 27
  • [47] Adaptive Temporal-Frequency Network for Time-Series Forecasting
    Yang, Zhangjing
    Yan, Wei-Wu
    Huang, Xiaolin
    Mei, Lin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (04) : 1576 - 1587
  • [48] DUAL-BRANCH ATTENTION-IN-ATTENTION TRANSFORMER FOR SINGLE-CHANNEL SPEECH ENHANCEMENT
    Yu, Guochen
    Li, Andong
    Zheng, Chengshi
    Guo, Yinuo
    Wang, Yutian
    Wang, Hui
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7847 - 7851
  • [49] Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement
    Tan, Ke
    Chen, Jitong
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 189 - 198
  • [50] PERCEPTUAL IMPROVEMENT OF DEEP NEURAL NETWORKS FOR MONAURAL SPEECH ENHANCEMENT
    Han, Wei
    Zhang, Xiongwei
    Sun, Meng
    Shi, Wenhua
    Chen, Xushan
    Hu, Yonggang
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,