A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement

被引:0
|
作者
Liang, Xintao [1 ]
Li, Yuhang [1 ]
Li, Xiaomin [1 ]
Zhang, Yue [1 ]
Ding, Youdong [1 ]
机构
[1] Shanghai Univ, Shanghai Film Acad, Shanghai 200072, Peoples R China
基金
中国国家自然科学基金;
关键词
speech enhancement; GAN; transformer; phase; spectrogram; dual stream; INTELLIGIBILITY; OPTIMIZATION; ALGORITHM; NOISE;
D O I
10.3390/info14040221
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Implementing single-channel speech enhancement under unknown noise conditions is a challenging problem. Most existing time-frequency domain methods are based on the amplitude spectrogram, and these methods often ignore the phase mismatch between noisy speech and clean speech, which largely limits the performance of speech enhancement. To solve the phase mismatch problem and further improve enhancement performance, this paper proposes a dual-stream Generative Adversarial Network (GAN) with phase awareness, named DPGAN. Our generator uses a dual-stream structure to predict amplitude and phase separately and adds an information communication module between the two streams to fully apply the phase information. To make the prediction more efficient, we apply Transformer to build the generator, which can learn the sound's structural properties more easily. Finally, we designed a perceptually guided discriminator that quantitatively evaluates the quality of speech, optimising the generator for specific evaluation metrics. We conducted experiments on the most widely used Voicebank-DEMAND dataset and DPGAN achieved state-of-the-art on most metrics.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK
    Soni, Meet H.
    Shah, Neil
    Patil, Hemant A.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5039 - 5043
  • [32] A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network
    Xiang, Yang
    Bao, Changchun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1826 - 1838
  • [33] Dual-stream Noise and Speech Information Perception based Speech Enhancement
    Li, Nan
    Wang, Longbiao
    Zhang, Qiquan
    Dang, Jianwu
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
  • [34] Efficient Online Big Data Stream Clustering Using Dual Interactive Wasserstein Generative Adversarial Network
    Matheswaran, Suresh
    Nachimuthu, Nandhagopal
    Prakash, G.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (05)
  • [35] Conditional generative adversarial network with dual-branch progressive generator for underwater image enhancement
    Lin, Peng
    Wang, Yafei
    Wang, Guangyuan
    Yan, Xiaohong
    Jiang, Guangqi
    Fu, Xianping
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 108
  • [36] Contrast Phase Classification with a Generative Adversarial Network
    Tang, Yucheng
    Lee, Ho Hin
    Xu, Yuchen
    Tang, Olivia
    Chen, Yunqiang
    Gao, Dashan
    Han, Shizhong
    Gao, Riqiang
    Bermudez, Camilo
    Savona, Michael R.
    Abramson, Richard G.
    Huo, Yuankai
    Landman, Bennett A.
    MEDICAL IMAGING 2020: IMAGE PROCESSING, 2021, 11313
  • [37] Speech Enhancement Based on A New Architecture of Wasserstein Generative Adversarial Networks
    Ye, Shuaishuai
    Jiang, Ting
    Qin, Shan
    Zou, Weixia
    Deng, Chengyun
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 399 - 403
  • [38] Speech Enhancement Using Generative Dictionary Learning
    Sigg, Christian D.
    Dikk, Tomas
    Buhmann, Joachim M.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (06): : 1698 - 1712
  • [39] Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Jesper
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 153 - 167
  • [40] Underwater image enhancement using improved generative adversarial network
    Zhang, Tingting
    Li, Yujie
    Takahashi, Shinya
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (22)