Channel and temporal-frequency attention UNet for monaural speech enhancement

被引:0
|
作者
Shiyun Xu
Zehua Zhang
Mingjiang Wang
机构
[1] Harbin Institute of Technology,Key Laboratory for Key Technologies of IoT Terminals
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2023卷
关键词
Speech enhancement; Neural network; Denoising; Dereverberation;
D O I
暂无
中图分类号
学科分类号
摘要
The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancement tasks aimed at improving speech quality. In this study, we propose a monaural speech enhancement model called the channel and temporal-frequency attention UNet (CTFUNet). CTFUNet takes the noisy spectrum as input and produces a complex ideal ratio mask (cIRM) as output. To improve the speech enhancement performance of CTFUNet, we employ multi-scale temporal-frequency processing to extract input speech spectrum features. We also utilize multi-conv head channel attention and residual channel attention to capture temporal-frequency and channel features. Moreover, we introduce the channel temporal-frequency skip connection to alleviate information loss between down-sampling and up-sampling. On the blind test set of the first deep noise suppression challenge, our proposed CTFUNet has better denoising performance than the champion models and the latest models. Furthermore, our model outperforms recent models such as Uformar and MTFAA in both denoising and dereverberation performance.
引用
收藏
相关论文
共 50 条
  • [21] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
    Zhang, Guochang
    Yu, Libiao
    Wang, Chunliang
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
  • [22] Joint waveform and magnitude processing for monaural speech enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    APPLIED ACOUSTICS, 2022, 200
  • [23] FRCRN: BOOSTING FEATURE REPRESENTATION USING FREQUENCY RECURRENCE FOR MONAURAL SPEECH ENHANCEMENT
    Zhao, Shengkui
    Ma, Bin
    Watcharasupat, Karn N.
    Gan, Woon-Seng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9281 - 9285
  • [24] CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
    Abdulatif, Sherif
    Cao, Ruizhe
    Yang, Bin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2477 - 2493
  • [25] DBT-Net: Dual-Branch Federative Magnitude and Phase Estimation With Attention-in-Attention Transformer for Monaural Speech Enhancement
    Yu, Guochen
    Li, Andong
    Wang, Hui
    Wang, Yutian
    Ke, Yuxuan
    Zheng, Chengshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2629 - 2644
  • [26] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 309 - 313
  • [27] Spatio-Temporal Features Representation Using Recurrent Capsules for Monaural Speech Enhancement
    Ali, Jawad
    Saleem, Nasir
    Bourouis, Sami
    Alabdulkreem, Eatedal
    El Mannai, Hela
    Dhahbi, Sami
    IEEE ACCESS, 2024, 12 : 21287 - 21303
  • [28] Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
    Peter Ochieng
    Artificial Intelligence Review, 2023, 56 : 3651 - 3703
  • [29] A Nested U-Net With Self-Attention and Dense Connectivity for Monaural Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 105 - 109
  • [30] Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
    Ochieng, Peter
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL3) : S3651 - S3703