A Conditional Generative Model for Speech Enhancement

被引:9
|
作者
Li, Zeng-Xi [1 ]
Dai, Li-Rong [1 ]
Song, Yan [1 ]
McLoughlin, Ian [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
基金
中国国家自然科学基金;
关键词
Deep learning; Speech enhancement; Generative model; Adversarial training; NOISE;
D O I
10.1007/s00034-018-0798-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.
引用
收藏
页码:5005 / 5022
页数:18
相关论文
共 50 条
  • [31] A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement
    Liang, Xintao
    Li, Yuhang
    Li, Xiaomin
    Zhang, Yue
    Ding, Youdong
    INFORMATION, 2023, 14 (04)
  • [32] Speech Enhancement with Topology-enhanced Generative Adversarial Networks (GANs)
    Zhang, Xudong
    Zhao, Liang
    Gu, Feng
    INTERSPEECH 2021, 2021, : 2726 - 2730
  • [33] Underwater image enhancement based on conditional generative adversarial network
    Yang, Miao
    Hu, Ke
    Du, Yixiang
    Wei, Zhiqiang
    Sheng, Zhibin
    Hu, Jintong
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 81 (81)
  • [34] Underwater Image Enhancement Based on Conditional Generative Adversarial Network
    Jin Weipei
    Guo Jichang
    Qi Qing
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (14)
  • [35] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552
  • [36] A Time-Domain Speech Enhancement Model with Controllable Output Based on Conditional Network
    Qu, Qingyang
    Song, Jiahui
    Zhang, Yuepeng
    Yuan, Wenhao
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
  • [37] VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Xu, Xinmeng
    Wang, Yang
    Xu, Dongxiang
    Peng, Yiyuan
    Zhang, Cong
    Jia, Jie
    Chen, Binbin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7307 - 7311
  • [38] Towards Generalized Speech Enhancement with Generative Adversarial Networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    INTERSPEECH 2019, 2019, : 1791 - 1795
  • [39] A New Gain Model for Speech Enhancement
    Cao Zhidong
    Li Shuangtian
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 490 - 493
  • [40] Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain
    Welker, Simon
    Richter, Julius
    Gerkmann, Timo
    INTERSPEECH 2022, 2022, : 2928 - 2932