A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement

被引:0
|
作者
Liang, Xintao [1 ]
Li, Yuhang [1 ]
Li, Xiaomin [1 ]
Zhang, Yue [1 ]
Ding, Youdong [1 ]
机构
[1] Shanghai Univ, Shanghai Film Acad, Shanghai 200072, Peoples R China
基金
中国国家自然科学基金;
关键词
speech enhancement; GAN; transformer; phase; spectrogram; dual stream; INTELLIGIBILITY; OPTIMIZATION; ALGORITHM; NOISE;
D O I
10.3390/info14040221
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Implementing single-channel speech enhancement under unknown noise conditions is a challenging problem. Most existing time-frequency domain methods are based on the amplitude spectrogram, and these methods often ignore the phase mismatch between noisy speech and clean speech, which largely limits the performance of speech enhancement. To solve the phase mismatch problem and further improve enhancement performance, this paper proposes a dual-stream Generative Adversarial Network (GAN) with phase awareness, named DPGAN. Our generator uses a dual-stream structure to predict amplitude and phase separately and adds an information communication module between the two streams to fully apply the phase information. To make the prediction more efficient, we apply Transformer to build the generator, which can learn the sound's structural properties more easily. Finally, we designed a perceptually guided discriminator that quantitatively evaluates the quality of speech, optimising the generator for specific evaluation metrics. We conducted experiments on the most widely used Voicebank-DEMAND dataset and DPGAN achieved state-of-the-art on most metrics.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] SEGAN: Speech Enhancement Generative Adversarial Network
    Pascual, Santiago
    Bonafonte, Antonio
    Serra, Joan
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3642 - 3646
  • [2] Speech Enhancement via Residual Dense Generative Adversarial Network
    Zhou, Lin
    Zhong, Qiuyue
    Wang, Tianyi
    Lu, Siyuan
    Hu, Hongmei
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 38 (03): : 279 - 289
  • [3] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [4] VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Xu, Xinmeng
    Wang, Yang
    Xu, Dongxiang
    Peng, Yiyuan
    Zhang, Cong
    Jia, Jie
    Chen, Binbin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7307 - 7311
  • [5] Speech Enhancement Using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    Maskeliunas, Rytis
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
  • [6] On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network
    Faraji, Farnood
    Attabi, Yazid
    Champagne, Benoit
    Zhu, Wei-Ping
    2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 77 - 82
  • [7] PAGAN: A PHASE-ADAPTED GENERATIVE ADVERSARIAL NETWORKS FOR SPEECH ENHANCEMENT
    Li, Peishuo
    Jiang, Zihang
    Yin, Shouyi
    Song, Dandan
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6234 - 6238
  • [8] Improved Wasserstein conditional generative adversarial network speech enhancement
    Qin, Shan
    Jiang, Ting
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,
  • [9] Improved Wasserstein conditional generative adversarial network speech enhancement
    Shan Qin
    Ting Jiang
    EURASIP Journal on Wireless Communications and Networking, 2018
  • [10] A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network
    Cao, Jie
    Zhou, Yaofeng
    Yu, Hong
    Li, Xiaoxu
    Wang, Dan
    Ma, Zhanyu
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 86 - 90