A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement

被引:0
|
作者
Liang, Xintao [1 ]
Li, Yuhang [1 ]
Li, Xiaomin [1 ]
Zhang, Yue [1 ]
Ding, Youdong [1 ]
机构
[1] Shanghai Univ, Shanghai Film Acad, Shanghai 200072, Peoples R China
基金
中国国家自然科学基金;
关键词
speech enhancement; GAN; transformer; phase; spectrogram; dual stream; INTELLIGIBILITY; OPTIMIZATION; ALGORITHM; NOISE;
D O I
10.3390/info14040221
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Implementing single-channel speech enhancement under unknown noise conditions is a challenging problem. Most existing time-frequency domain methods are based on the amplitude spectrogram, and these methods often ignore the phase mismatch between noisy speech and clean speech, which largely limits the performance of speech enhancement. To solve the phase mismatch problem and further improve enhancement performance, this paper proposes a dual-stream Generative Adversarial Network (GAN) with phase awareness, named DPGAN. Our generator uses a dual-stream structure to predict amplitude and phase separately and adds an information communication module between the two streams to fully apply the phase information. To make the prediction more efficient, we apply Transformer to build the generator, which can learn the sound's structural properties more easily. Finally, we designed a perceptually guided discriminator that quantitatively evaluates the quality of speech, optimising the generator for specific evaluation metrics. We conducted experiments on the most widely used Voicebank-DEMAND dataset and DPGAN achieved state-of-the-art on most metrics.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552
  • [22] Time-domain speech enhancement using generative adversarial networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    SPEECH COMMUNICATION, 2019, 114 : 10 - 21
  • [23] Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
    Lin, Ju
    Niu, Sufeng
    Wei, Zice
    Lan, Xiang
    van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    INTERSPEECH 2019, 2019, : 3163 - 3167
  • [24] Towards Generalized Speech Enhancement with Generative Adversarial Networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    INTERSPEECH 2019, 2019, : 1791 - 1795
  • [25] Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
    Wu, Jianfeng
    Hua, Yongzhu
    Yang, Shengying
    Qin, Hongshuai
    Qin, Huibin
    APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [26] Speech Enhancement with Topology-enhanced Generative Adversarial Networks (GANs)
    Zhang, Xudong
    Zhao, Liang
    Gu, Feng
    INTERSPEECH 2021, 2021, : 2726 - 2730
  • [27] TFDense-GAN: a generative adversarial network for single-channel speech enhancement
    Chen, Haoxiang
    Zhang, Jinxiu
    Fu, Yaogang
    Zhou, Xintong
    Wang, Ruilong
    Xu, Yanyan
    Ke, Dengfeng
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2025, 2025 (01):
  • [28] EXPLORING SPEECH ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Donahue, Chris
    Li, Bo
    Prabhavalkar, Rohit
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5024 - 5028
  • [29] Joint Ideal Ratio Mask and Generative Adversarial Networks for Monaural Speech Enhancement
    Yuan, Jing
    Bao, Changchun
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 276 - 280
  • [30] Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
    Li, Lujun
    Wudamu
    Kuerzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    APPLIED SCIENCES-BASEL, 2021, 11 (16):