A Conditional Generative Model for Speech Enhancement

被引:9
|
作者
Li, Zeng-Xi [1 ]
Dai, Li-Rong [1 ]
Song, Yan [1 ]
McLoughlin, Ian [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
基金
中国国家自然科学基金;
关键词
Deep learning; Speech enhancement; Generative model; Adversarial training; NOISE;
D O I
10.1007/s00034-018-0798-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.
引用
收藏
页码:5005 / 5022
页数:18
相关论文
共 50 条
  • [1] A Conditional Generative Model for Speech Enhancement
    Zeng-Xi Li
    Li-Rong Dai
    Yan Song
    Ian McLoughlin
    Circuits, Systems, and Signal Processing, 2018, 37 : 5005 - 5022
  • [2] CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT
    Lu, Yen-Ju
    Wang, Zhong-Qiu
    Watanabe, Shinji
    Richard, Alexander
    Yu, Cheng
    Tsao, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7402 - 7406
  • [3] Conditional Denoising Diffusion Implicit Model for Speech Enhancement
    Yang C.
    Yu X.
    Huang S.
    International Journal of Speech Technology, 2024, 27 (01) : 201 - 209
  • [4] Improved Wasserstein conditional generative adversarial network speech enhancement
    Qin, Shan
    Jiang, Ting
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,
  • [5] Improved Wasserstein conditional generative adversarial network speech enhancement
    Shan Qin
    Ting Jiang
    EURASIP Journal on Wireless Communications and Networking, 2018
  • [6] Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network
    Routray, Sidheswar
    Mao, Qirong
    COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [7] SEGAN: Speech Enhancement Generative Adversarial Network
    Pascual, Santiago
    Bonafonte, Antonio
    Serra, Joan
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3642 - 3646
  • [8] Controllable speech enhancement model based on perceptual conditional network
    Yuan W.
    Qu Q.
    Liang C.
    Xia B.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (05): : 53 - 60
  • [9] SPEECH ENHANCEMENT VIA GENERATIVE ADVERSARIAL LSTM NETWORKS
    Xiang, Yang
    Bao, Changchun
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 46 - 50
  • [10] Multi-scale Generative Adversarial Networks for Speech Enhancement
    Li, Yihang
    Jiang, Ting
    Qin, Shan
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,