Channel and temporal-frequency attention UNet for monaural speech enhancement

被引：0

作者：

Shiyun Xu

Zehua Zhang

Mingjiang Wang

机构：

[1] Harbin Institute of Technology,Key Laboratory for Key Technologies of IoT Terminals

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2023卷

关键词：

Speech enhancement; Neural network; Denoising; Dereverberation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancement tasks aimed at improving speech quality. In this study, we propose a monaural speech enhancement model called the channel and temporal-frequency attention UNet (CTFUNet). CTFUNet takes the noisy spectrum as input and produces a complex ideal ratio mask (cIRM) as output. To improve the speech enhancement performance of CTFUNet, we employ multi-scale temporal-frequency processing to extract input speech spectrum features. We also utilize multi-conv head channel attention and residual channel attention to capture temporal-frequency and channel features. Moreover, we introduce the channel temporal-frequency skip connection to alleviate information loss between down-sampling and up-sampling. On the blind test set of the first deep noise suppression challenge, our proposed CTFUNet has better denoising performance than the champion models and the latest models. Furthermore, our model outperforms recent models such as Uformar and MTFAA in both denoising and dereverberation performance.

引用

共 50 条

[31] Adversarial Dictionary Learning for Monaural Speech Enhancement
Ji, Yunyun
Xu, Longting
Zhu, Wei-Ping
INTERSPEECH 2020, 2020, : 4034 - 4038
[32] GAN-in-GAN for Monaural Speech Enhancement
Duan, Yicun
Ren, Jianfeng
Yu, Heng
Jiang, Xudong
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 853 - 857
[33] Single channel speech enhancement using temporal masking
Gunawan, TS
Ambikairajah, E
2004 9TH IEEE SINGAPORE INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS (ICCS), 2004, : 250 - 254
[34] DDP-Unet: A mapping neural network for single-channel speech enhancement
Chen, Haoxiang
Xu, Yanyan
Ke, Dengfeng
Su, Kaile
COMPUTER SPEECH AND LANGUAGE, 2025, 93
[35] A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement
Yechuri, Sivaramakrishna
Komati, Thirupathi Rao
Yellapragada, Rama Krishna
Vanambathina, Sunnydaya
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (09) : 5682 - 5710
[36] Single channel speech enhancement using time-frequency attention mechanism based nested U-net model
Prathipati, Anil Kumar
Chakravarthy, A. S. N.
ENGINEERING RESEARCH EXPRESS, 2024, 6 (03):
[37] A COMPUTATIONALLY-EFFICIENT SINGLE-CHANNEL SPEECH ENHANCEMENT ALGORITHM FOR MONAURAL HEARING AIDS
Ayllon, David
Gil-Pita, Roberto
Utrilla-Manso, Manuel
Rosa-Zurera, Manuel
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2050 - 2054
[38] On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement
Kolbaek, Morten
Tan, Zheng-Hua
Jensen, Soren Holdt
Jensen, Jesper
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 825 - 838
[39] Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
Wang, Xianyun
Bao, Changchun
INTERSPEECH 2019, 2019, : 3188 - 3192
[40] A Time-Frequency Attention Module for Neural Speech Enhancement
Zhang, Qiquan
Qian, Xinyuan
Ni, Zhaoheng
Nicolson, Aaron
Ambikairajah, Eliathamby
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 462 - 475

← 1 2 3 4 5 →