TFDense-GAN: a generative adversarial network for single-channel speech enhancement

被引:0
|
作者
Chen, Haoxiang [1 ]
Zhang, Jinxiu [1 ]
Fu, Yaogang [1 ]
Zhou, Xintong [1 ]
Wang, Ruilong [1 ]
Xu, Yanyan [1 ]
Ke, Dengfeng [2 ]
机构
[1] Beijing Forestry Univ, 35 Qinghua East Rd, Beijing 100083, Peoples R China
[2] Beijing Language & Culture Univ, 15 Xueyuan Rd, Beijing 100083, Peoples R China
来源
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING | 2025年 / 2025卷 / 01期
关键词
Speech enhancement; Time-frequency domain; Generative adversarial network; Improved DenseBlock; Time-frequency transformer;
D O I
10.1186/s13634-025-01210-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Research indicates that utilizing the spectrum in the time-frequency domain plays a crucial role in speech enhancement tasks, as it can better extract audio features and reduce computational consumption. For the speech enhancement methods in the time-frequency domain, the introduction of attention mechanisms and the application of DenseBlock have yielded promising results. In particular, the Unet architecture, which comprises three main components, the encoder, the decoder, and the bottleneck, employs DenseBlock in both the encoder and the decoder to achieve powerful feature fusion capabilities with fewer parameters. In this paper, in order to enhance the advantages of the aforementioned methods for speech enhancement, we propose a Unet-based time-frequency domain denoising model called TFDense-Net. It utilizes our improved DenseBlock for feature extraction in both the encoder and the decoder and employs an attention mechanism in the bottleneck for feature fusion and denoising. The model has demonstrated excellent performance for speech enhancement tasks, achieving significant improvements in the Si-SDR metric compared to other state-of-the-art models. Additionally, to further enhance the denoising performance and increase the receptive field of the model, we introduce a multi-spectrogram discriminator based on multiple STFTs. Since the discriminator loss can observe the correlations between spectra that traditional loss functions cannot detect, we train TFDense-Net as a generator against the multi-spectrogram discriminator, resulting in a significant improvement in the denoising performance, and we name this enhanced model TFDense-GAN. We evaluate our proposed TFDense-Net and TFDense-GAN on two public datasets: the VCTK + DEMAND dataset and the Interspeech Deep Noise Suppression Challenge dataset. Experimental results show that TFDense-GAN outperforms most existing models in terms of STOI, PESQ, and Si-SDR, achieving state-of-the-art results. The comparison samples of TFDense-GAN and other models can be accessed from https://github.com/yhsjoker/TFDense-GAN.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Speech Enhancement Using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    Maskeliunas, Rytis
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
  • [2] Single-Channel Speech Quality Enhancement in Mobile Networks Based on Generative Adversarial Networks
    Wu, Guifen
    Herencsar, Norbert
    MOBILE NETWORKS & APPLICATIONS, 2024,
  • [3] Single-channel Speech Dereverberation via Generative Adversarial Training
    Li, Chenxing
    Wang, Tieqiang
    Xu, Shuang
    Xu, Bo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1309 - 1313
  • [4] Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [5] CompNet: Complementary network for single-channel speech enhancement
    Fan, Cunhang
    Zhang, Hongmei
    Li, Andong
    Xiang, Wang
    Zheng, Chengshi
    Lv, Zhao
    Wu, Xiaopei
    NEURAL NETWORKS, 2023, 168 : 508 - 517
  • [6] CP-GAN: CONTEXT PYRAMID GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Liu, Gang
    Gong, Ke
    Liang, Xiaodan
    Chen, Zhiguang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6624 - 6628
  • [7] Single-channel blind source separation based on attentional generative adversarial network
    Sun, Xiao
    Xu, Jindong
    Ma, Yongli
    Zhao, Tianyu
    Ou, Shifeng
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 13 (03) : 1443 - 1450
  • [8] SEGAN: Speech Enhancement Generative Adversarial Network
    Pascual, Santiago
    Bonafonte, Antonio
    Serra, Joan
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3642 - 3646
  • [9] Using Hybrid Penalty and Gated Linear Units to Improve Wasserstein Generative Adversarial Networks for Single-Channel Speech Enhancement
    Zhu, Xiaojun
    Huang, Heming
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (03): : 2155 - 2172
  • [10] Single-channel blind source separation based on attentional generative adversarial network
    Xiao Sun
    Jindong Xu
    Yongli Ma
    Tianyu Zhao
    Shifeng Ou
    Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 1443 - 1450