A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement

被引:0
|
作者
Liang, Xintao [1 ]
Li, Yuhang [1 ]
Li, Xiaomin [1 ]
Zhang, Yue [1 ]
Ding, Youdong [1 ]
机构
[1] Shanghai Univ, Shanghai Film Acad, Shanghai 200072, Peoples R China
基金
中国国家自然科学基金;
关键词
speech enhancement; GAN; transformer; phase; spectrogram; dual stream; INTELLIGIBILITY; OPTIMIZATION; ALGORITHM; NOISE;
D O I
10.3390/info14040221
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Implementing single-channel speech enhancement under unknown noise conditions is a challenging problem. Most existing time-frequency domain methods are based on the amplitude spectrogram, and these methods often ignore the phase mismatch between noisy speech and clean speech, which largely limits the performance of speech enhancement. To solve the phase mismatch problem and further improve enhancement performance, this paper proposes a dual-stream Generative Adversarial Network (GAN) with phase awareness, named DPGAN. Our generator uses a dual-stream structure to predict amplitude and phase separately and adds an information communication module between the two streams to fully apply the phase information. To make the prediction more efficient, we apply Transformer to build the generator, which can learn the sound's structural properties more easily. Finally, we designed a perceptually guided discriminator that quantitatively evaluates the quality of speech, optimising the generator for specific evaluation metrics. We conducted experiments on the most widely used Voicebank-DEMAND dataset and DPGAN achieved state-of-the-art on most metrics.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
    Li, Haoyu
    Fu, Szu-Wei
    Tsao, Yu
    Yamagishi, Junichi
    INTERSPEECH 2020, 2020, : 1336 - 1340
  • [42] SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
    Lv R.
    Chen N.
    Cheng S.
    Fan G.
    Rao L.
    Song X.
    Lv W.
    Yang D.
    Mathematical Biosciences and Engineering, 2024, 21 (03) : 3860 - 3875
  • [43] Improving generative adversarial networks for speech enhancement through regularization of latent representations
    Yang, Fan
    Wang, Ziteng
    Li, Junfeng
    Xia, Risheng
    Yan, Yonghong
    SPEECH COMMUNICATION, 2020, 118 (118) : 1 - 9
  • [44] SERGAN: SPEECH ENHANCEMENT USING RELATIVISTIC GENERATIVE ADVERSARIAL NETWORKS WITH GRADIENT PENALTY
    Baby, Deepak
    Verhulst, Sarah
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 106 - 110
  • [45] Schrodinger Bridge for Generative Speech Enhancement
    Jukic, Ante
    Korostik, Roman
    Balam, Jagadeesh
    Ginsburg, Boris
    INTERSPEECH 2024, 2024, : 1175 - 1179
  • [46] A Conditional Generative Model for Speech Enhancement
    Li, Zeng-Xi
    Dai, Li-Rong
    Song, Yan
    McLoughlin, Ian
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (11) : 5005 - 5022
  • [47] CDE-GAN: Cooperative Dual Evolution-Based Generative Adversarial Network
    Chen, Shiming
    Wang, Wenjie
    Xia, Beihao
    You, Xinge
    Peng, Qinmu
    Cao, Zehong
    Ding, Weiping
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (05) : 986 - 1000
  • [48] TEGAN: Transformer Embedded Generative Adversarial Network for Underwater Image Enhancement
    Zhi Gao
    Jing Yang
    Lu Zhang
    Fengling Jiang
    Xixiang Jiao
    Cognitive Computation, 2024, 16 : 191 - 214
  • [49] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 309 - 313
  • [50] A generative adversarial network with multiscale and attention mechanisms for underwater image enhancement
    Zhao, Liquan
    Li, Yuda
    Zhong, Tie
    SCIENTIFIC REPORTS, 2025, 15 (01):